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ABSTRACT 
This  work  investigates  some  problems  arising  in  application  of 
(Kalman)  linear  filter  theory  to  real  problems,  where  practical  es- 
timates must  replace  exact  theoretical  quantities  in  problem  form- 
ulation o    The  principle  objective  is  application  of  linear  filter  theory 
to  random  signal  detection/clctWif ft?8*Mlh .    However,  an  example  of 
classical  estimation,  error  estimation  in  shipboard  inertial  navigation 
systems,  is  offered  to  illustrate  general  points  discussed.    A  unified 
treatment  of  models  for  random  time  series  is  presented,  including  a 
comparative  review  of  models  which  have  been  proposed  and  pro- 
cedures for  obtaining  model  coefficients.    Correlation  detection  of 
deterministic  signals  is  discussed  and  the  resulting  principles  ex- 
tended to  the  case  of  random  signal  detection.    Application  of 
linear  filter  theory  to  the  problem  is  indicated.    Finally,  an  ex- 
perimental study  in  random  signal  detection/classification  is 
included.    Experimental  signals  used  are  hydrophone  recordings  of 
sea  noise  and  sea  noise  plus  diesel  submarine.    Consistency  of 
successful  results  obtained  suggests  practical  utility  of  method  in 
certain  random  signal  detection/classification  problems. 
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CHAPTER  I 
INTRODUCTION 

It  is  frequently  an  integral  requirement  in  engineering  information 
processing  systems  that  measurement  data  be  employed  in  subsequent 
decision  making  or  action.    Invariably,  the  processing  problem  lies 
in  a  stochastic  environment  because  of  inaccuracies  in  the  measure- 
ment process  or  random  perturbations  of  the  observed  process  during 
or  between  measurements.    Some  examples  are  determination  of 
satellite  orbit  parameters  from  noisy  radar  or  radio-location  data, 
target  track  determination  in  the  radar  fire  control  problem ,  filtering 
of  signals  received  in  the  presence  of  noise,  and  state  estimation  in 
linear  dynamic  systems  from  noisy  transducer  information. 

What  might  be  termed  "classical"  estimation  theory  in  engineer- 
ing has  been  based  upon  the  pioneering  work  of  Wiener  published 
in  1942.    This  theory  rests  upon  three  main  assumptions:    (i)  linear 
operations  on  received  data,   (ii)  a  quadratic  loss  criterion,  and 
(iii)  stationary  statistics  describing  the  signal  and  noise  processes. 
In  applications,  the  objective  is  to  obtain  a  specification  of  the 
electrical  filter  (the  Wiener  filter)  giving  optimum  estimation,  or 
isolation,  of  the  signal  from  corrupting  additive  noise.    Posed  in 
terms  of  integral  equations,  the  problem  results  in  the  Wiener-Hopf 
integral  equation,  which  must  be  solved  to  obtain  the  impulse  re- 
sponse of  the  optimum  filter.    Broad  application  of  the  theory  has 
suffered  from  the  difficulty  of  obtaining  solutions  for  the  Wiener- 
Hopf  equation. 

In  1960,  Kalman  posed  the  estimation  problem  in  terms  of 
differential  equations,  using  the  concept  of  "state"  from  linear 
system  theory.    Characterization  of  the  problem  in  terms  of 
differential  equations  offers  several  advantages .    Chief  among  them 
are  suitability  of  the  new  approach  for  machine  computation,  re- 
laxation of  the  stationarity  assumption,  and  the  intimate  relationship 
between  the  optimum  filter  and  a  linear  model  describing  the  signal 
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process.    The  work  here  is  motivated  by  the  contributions  of  Kalman 
and  considers  application  of  his  techniques  to  problems  in  statistical 
communication  theory. 

In  the  title,  the  term  "real-time"  means  that  computational  effort 
for  estimation  procedures  to  be  employed  does  not  increase  with  in- 
creased numbers  of  observations.    Further,  following  the  practice  of 
recent  years,  the  collective  term  "estimation"  is  considered  to  include 
the  traditional  problems  of  smoothing,  filtering,  and  prediction.    Since 
digital  computation  is  to  be  employed  in  examples,  and  since  the 
required  mathematics  is  less  obscure,  the  present  work  will  deal 
almost  exclusively  with  estimation  of  discrete  signals.    For  engineer- 
ing purposes,  little  generality  is  lost.    The  description  of  continuous 
filters  can  in  general  be  obtained  by  considering  the  limiting  case  of 
the  corresponding  discrete  filter  as  the  sampling  interval  is  allowed 
to  approach  zero. 

In  Chapter  II,  basic  assumptions  underlying  optimum  estimation 
theory  developed  by  Kalman  are  reviewed,  and  some  practical  con- 
siderations are  discussed.    As  an  example,  the  problem  of  estimating 
latitude  error  in  shipboard  inertial  navigation  systems  is  outlined, 
and  application  of  the  Kalman  filter  to  the  problem  is  described. 

Chapter  III  considers  the  important  problem  of  finite  parameter 
models  for  random  time  series,  a  subject  not  heretofore  treated  in 
a  unified  manner.    First,  the  Spectral  Distribution  Function,  intro- 
duced for  the  discrete  time  case  by  Wold  in  1938,  is  discussed  and 
its  three  constituent  distribution  functions  noted.    Then  models  which 
have  been  proposed  for  time  series  are  considered  with  regard  to  their 
capacities  for  representing  the  various  constituent  functions.    The 
combined  autoregressive-moving  average  model  is  shown  to  be 
equivalent  to  the  linear  filter,  and  the  relationship  between  their 
respective  parameters  is  indicated.    A  discussion  of  parameter  es- 
timation is  then  included.    Present  difficulties  in  solving  for  moving 
average  model  parameters  is  noted.    It  is  shown  that  the  approach  of 
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Ho  and  Lee  [16  ]  to  the  identification  problem  amounts  also  to  a 
recursive  means  for  estimating  autoregressive  model  coefficients. 
Finally,  the  recursive  approach  is  employed  in  an  example  to  obtain 
autoregressive  model  coefficients  for  a  sample  signal.    The  signal 
consists  of  a  50  cycle  sine  wave  combined  additively  with  noise  from 
a  laboratory  random  noise  generator.    A  plot  of  z-plane  poles  of  the 
model  is  given.    Also  included  are  comparative  graphs  of  the  auto- 
correlation functions  and  power  spectral  densities  derived  from  the 
signal  and  model  respectively. 

The  detection  of  random  signals  in  noise  is  considered  in  Chapter 
IV.    As  it  happens,  optimum  detection  of  Gaussian  random  signals 
constitutes  a  simple  (at  least  conceptually)  extension  to  the  idea  of 
correlation  detection  of  deterministic  signals.    In  the  case  of  Gaussian 
random  signals,  the  optimum  receiver  forms  a  cross  correlation  of  the 
received  signal  and  a  least  squares  estimate  of  the  Gaussian  signal 
given  the  received  signal .    The  extension  is  shown  following  a  brief 
development  of  correlation  detection  of  deterministic  signals.    Diffi- 
culties involved  in  directly  evaluating  the  Likelihood  Function  for 
each  alternative  classification  are  noted.    Then  employing  the  approach 
of  Schweppe  [31  ],  it  is  shown  that  the  Likelihood  Function  may  be 
evaluated  recursively.    For  random  signals  which  are  single  time 
series,  no  matrix  inversion  is  required.    Quantities  required  in  the 
recursion  relations  may  be  obtained  from  Kalman  filter  estimation  of 
the  Gaussian  signal  given  the  received  signal.    On-line  computations 
may  be  greatly  reduced  by  storage  of  short  lists  of  required  quantities 
which  do  not  depend  upon  the  received  signal.    Using  stored  lists, 
on-line  multiply-add  combinations  per  received  signal  sample  are 
reduced  to  approximately  2n  +  2  for  each  classification  alternative, 
where    n    is  the  order  of  signal  models  used. 

An  experimental  problem  in  random  signal  detection  investigated 
in  Chapter  V  represents  the  first  testing  against  actual  signals  of  the 
methods  developed  or  discussed  in  this  dissertation.    The  experimental 
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signals  used  are  hydrophone  recordings  of  (a)  a  diesel  submarine  plus 
sea  noise  and  (b)  sea  noise  alone.    After  obtaining  autoregressive 
models  for  the  signal-plus-noise  and  noise  processes,  recursive 
Likelihood  Function  evaluation  discussed  in  Chapter  IV  is  employed  to 
test  performance  of  optimum  linear  classification  on  actual  signals. 

Consistent  correct  classification  is  obtained  within  less  than  100 
samples  of  signal  record  (representing  less  than  60  millisecs  of  signal 
record).    Furthermore,  classification  efficiency  is  maintained  over 
rather  wide  deviations  between  actual  and  assumed  received  signal 
parameters.    Hence  initial  results  suggest  practical  applicability  of 
the  method  to  certain  real-time  random  signal  detection/classification 
problems . 
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CHAPTER  II 

LINEAR  LEAST  SQUARES  ESTIMATION  - 

THE  KALMAN  FILTER 

Interest  in  time-domain  estimation  techniques  and  applications 
has  been  stimulated  in  recent  years  by  the  contributions  of  R.  E. 
Kalman  [19  ].    Subsequent  investigators  have  related  Kalman's 
derivation  and  results  to  more  familiar  estimation  techniques,  thereby 
enhancing  general  understanding  and  utility  of  the  new  approach.     Per- 
haps the  simplest  derivation  of  Kalman's  results  lies  in  the  Bayesian 
approach  employed  by  Peschon  and  Weaver  [26  ],   [35  ].    Another 
step  toward  better  perspective  was  the  demonstration  by  Lee  [21  ] 
and  Fagin  [10  ]  of  the  equivalence  between  Kalman's  results  and 
generalized  recursive  least  squares  estimation.    A  unified  and 
general  treatment  of  modern  estimation  theory  by  Deutsch  [9  ]has 
recently  been  published. 

The  purpose  of  this  chapter  is  to  review  the  basic  assumptions 
underlying  Wiener-Kalman  filter  theory  and  implications  of  the 
assumptions.    Further,  practical  considerations  arising  in  appli- 
cations are  discussed.    In  this  regard,  a  comparative  discussion  of 
the  Kalman  filter  and  generalized,  recursive  estimation  of  a  Gaussian 
population  mean  is  introduced  to  explicitly  show  the  role  of  a  priori 
information.    Finally,  a  practical  problem  is  considered  to  illustrate 
an  application  of  the  methods  treated  here. 

I.     BASIC  ASSUMPTIONS 

The  two  basic  assumptions  underlying  optimal  filter  theory  are 
(1)  linear  operations  on  the  data  are  to  be  performed,  and  (2)  the 
criterion  for  goodness  is  mean  squared  error.    Under  these  assump- 
tions ,  it  is  well  known  that  only  second  order  statistical  moments 
are  employed  in  the  estimation  scheme  [8  ].    An  immediate  result  is 
that  for  non-Gaussian  signal  processes,  the  optimal  linear  estimate 
obtained  is  the  same  as  would  have  been  obtained  from  the  corres- 
ponding Gaussian  signal  process  (i.e.  possessing  the  same  first 
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two  moments).    Moreover,  the  optimal  linear  least  squares  estimate 
of  a  Gaussian  process  is  identical  with  the  Maximum  Likelihood 
estimate.    The  conclusion  follows  that  linear  least  squares  estimation 
may  be  improved  upon  only  for  non-Gaussian  processes,  and  then 
only  by  an  estimation  procedure  which  takes  into  account  third  or 
higher  statistical  moments  of  the  process.    Conversely,  in  dis- 
cussions restricted  to  linear  least  squares  estimation,  assumptions 
of  Gaussian  process  statistics  can  be  made  without  effect  on  the 
estimation  outcome. 

II.     THE  SIGNAL  PROCESS  MODEL 
It  is  commonly  assumed  in  engineering  literature  on  stochastic 
estimation  that  the  scalar  or  vector-valued  random  time  series  to  be 
processed  can  be  regarded  as  the  output  of  a  linear  dynamic  system 
excited  by  white  noise.    Implications  of  this  model  will  be  discussed 
in  greater  detail  in  Chapter  III.    It  is  also  assumed  that  observations 
of  the  system  output  are  obscured  by  additive  Gaussian  white  noise. 
Block  diagrams  of  continuous  and  discrete  descriptions  of  the  model 
are  given  in  Figs.  2-1  and  2-2  respectively. 

The  general  continuous  linear  dynamic  system  model  may  be 
described  by  the  vector  differential  equations 

x(t)  =  F(t)x(t)  +  G(t)w(t) 

y(t)  =  H(t)x(t) 

z(t)  =  y(t)  +  v(t) 


where 


x(t)  is  the  n  x  1  vector  of  system  states. 

F(t)  is  an  n  x  n  matrix  describing  the  homogeneous  system. 

w(t)  is  a  p  x  1  vector  of  Gaussian  white  noise 

G(t)  is  an  n  x  p  matrix  which  distributes  excitation  noise 

across  the  states. 
y(t)  is  an  m  x  1  vector  of  system  output. 
H(t)  is  an  m  x  n  observation  matrix. 
v(t)  is  an  m.  x  1  vector  of  white  Gaussian  measurement 

noise 
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Block  Diagram  of  General  Continuous 
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Fig.  2-1 
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Block  Diagram  of  General  Discretely 
Described  Linear  Dynamic  System 
Fig.  2-2 
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z(t)  is  an  m  x  1  vector  of  observations. 
Excitation  and  measurement  noise  sources  are  assumed  to  be  indepen- 
dent.   Characteristics  of  the  noise  vectors   w(t)    and    v(t)    are: 

E[w(t)]=E[v(t)]=  0 

E[w(t)w(T)T]=    (Q    t==T 
lO    t^T 

E[v(t)v(T)T]=     {*    *  =  T 
LO    t  f  T 

For  the  corresponding  discretely  described  linear  dynamic  system, 

the  describing  vector  difference  equations  are 

x(k)  =  $(k)x(k-l)  +  T(k)w(k)-1) 

y(k)  =  H(k)x(k) 

z(k)  =y(k)  +  v(k) 
Corresponding  to  the  continuous  case,  the  noise  vectors  w(k)  and 
y(k)  are  assumed  to  be  samples  of  a  vector-valued  Gaussian  noise 
process,  and  are  assumed  to  be  independent  from  sample  to  sample. 
For  w(k),  this  assumption  of  uncorrected  samples  represents  no  loss 
of  generality,  since  the  excitation  noise  is  fictitious  anyway.    On 
the  other  hand,  measurement  noise  encountered  in  certain  applications 
may  not  be  totally  uncorrelated  from  sample  to  sample.    In  such  cases, 
the  correlated  portion  may  be  accounted  for  by  augmenting  the  system 
order  as  necessary.    The  form  of  the  resulting  augmented  system  is 
simply  arrived  at.    Actual  numbers  to  describe  the  measurement  error 
correlation  may  be  obtained  by  considering  the  points  discussed  in 
Chapter  III . 

III.     THE  OPTIMAL  FILTER 
References  cited  above  include  derivations  of  the  Kalman  filter 
employing  variously  geometric  (orthogonal  projections  in  Hilbert 
space),  Bayesian,  and  Maximum  Likelihood  approaches .    Addition- 
ally, equivalence  of  the  optimal  filter  and  recursive  linear  least 
squares  prediction  has  been  indicated.    It  will  therefore  suffice  for 
the  work  to  follow  to  include  here  only  the  equations  describing  the 
optimal  filter  and  the  corresponding  block  diagram  (Fig.  2-3).    The 
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equations  given  are  for  the  discrete  formulation. 

B('k)  =  P(k/k-l)HT(HP(k/k-l)HT  +  R)"1 

x(k/k)  =  <fcx(k-l/k-l)  +K(k)[z(k)  -  H  &x(k-l/k-l)] 

P(k/k)  =  [I  -  K(k)H]P(k/k-l) 

P(k+l/k)  =  *P(k/k)  4  T  +  T  Q  TT 

In  the  equations  given,    P    is  an  n  x  n  covariance  matrix  of  the 
estimate.    The  double  index  is  used  to  mean  that  P(k/k-l)  is  the 
variance  of  x(k/k-l)/  the  optimal  state  estimate  at  time  increment    k 
given  data  only  up  to  time    k-1     (x(k/k-l)  =  $x(k-l/k-l)). 

While  the  equations  above  describe  optimal  linear  estimation  of 
the  signal  process  model  states,  what  frequently  is  required  is  the 
optimal  estimate  of  some  linear  combination  of  the  states,  for  ex- 
ample   y  =  Hx.    In  order  that  the  equations  above  be  useful  in  such 
a  case,  it  must  be  so  that  the  optimal  estimate  of  a  linear  combin- 
ation of  states  is  the  same  linear  combination  of  the  optimal  estimate 
of  the  states.    A  proof  of  this  point  is  included  in  Appendix  I. 

IV.    A  PRIORI  INFORMATION 

As  is  readily  noted,  the  difference  equations  above  which  de- 
scribe behavior  of  the  Kalman  filter  or  recursive  least  squares  es- 
timation require  a  starting  point.    What  is  needed  is  an  initial 
estimate  of  the  states  x(l/0),  and  an  initial  covariance  matrix 
P(l/0),  indicating  the  uncertainty  in  the  initial  estimate.    These 
quantities  represent  a  priori  knowledge  about  the  process  to  be  es- 
timated since  they  are  required  to  be  specified  before  any  obser- 
vations are  made.    An  intuitive  feel  for  the  role  of    x(l/0)    and    P(l/0) 
may  be  gained  by  comparing  behavior  of  the  Kalman  filter  and  gen- 
eralized recursive  estimation  of  a  Gaussian  population  mean,  out- 
lined in  Appendix  II.    It  is  shown  there  that  a  priori  information 
appears  to  the  filter  just  as  additional  equivalent  observations  of  the 
process  to  be  estimated. 

It  is  difficult  in  many  problems  to  arrive  at  meaningful  values  for 
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the  a  priori  quantities  required.    If  the  signal  process  model  is  stable 
and  the  excitation  noise  covariance    Q    is  known,  a  priori  knowledge 
may  be  obtained  directly  from  knowledge  of  the  model.    Motion  of  states 
of  the  model  is  a  Gaussian  random  process,  a  linear  function  of  the 
Gaussian  excitation  process.    Since  the  excitation  has  zero  mean,  the 
expected  value  of  states  of  the  model  given  no  observations  is  also 
zero.    Thus    x(l/0)    equals  the  zero  vector.    Further,  the  covariance 
describing  uncertainty  in  state  values  is  the  solution  matrix    S    of 
the  difference  equation 

s  =  $s$T  +  rQrT 

At  any  stage,  the  matrix    P(k/k-l)    is  defined  as  the  error  co- 
variance  matrix  at  time    k    given  observations  to  time    k-1.    Hence 

P(l/0)  =  E[(x(l)  -  x(l/0))(x(l)  -  x(l/0))T] 
=  E[x(l)x(l)T] 
=  S 

Another  suggested  approach  when  a  priori  requirements  present 
difficulty  is  to  specify  some  reasonable  value  of  x(l/0)    and  let 
P(l/0)  =  cl,  where    c    is  a  large  number.    It  is  motivated  by  the  fact 
that  weighting  of  measurement  data  is  in  inverse  proportion  to  un- 
certainty in  the  data  relative  to  uncertainty  of  the  current  best  estimate 
before  data  receipt.    Thus  effects  of  the  initial  assumptions  may  be 
washed  out  by  the  incorporation  of  new  data  [21],   [10]. 

In  the  example  below,  both  the  above  methods  are  tried  and  com- 
parative results  noted.    Further  use  of  the  two  methods  is  made  in 
the  experimental  study  of  Chapter  V.    There  the  latter  method  is  used 
in  the  recursive  model  determination  program,  and  the  former  is  used 
in  the  program  for  computation  of  Likelihood  Functions . 
V.    AN  EXAMPLE  -  ESTIMATION  OF  SYSTEM 
ERROR  IN  SHIPBOARD  INERTIAL  NAVIGATION  SYSTEMS 

An  example  to  which  the  techniques  described  above  may  be  very 
naturally  applied  is  the  problem  of  error  estimation  in  shipboard  inertial 
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navigation  systems  (SINS).    Gyroscopes  used  in  the  system  to  provide 
a  stable  inertial  platform  are  subject  to  certain  random  drifts.    Hence 
in  order  to  maintain  the  necessary  accuracy  of  the  system  over  in- 
definite periods  at  sea,  it  is  necessary  to  determine  system  errors 
for  use  in  recalibrating  or  resetting  the  SINS.     Such  an  error  deter- 
mination necessarily  depends  upon  information  from  external  sources; 
in  this  case  upon  LORAN  position  information  and  position/heading 
information  from  star  sightings,  etc.    However,  information  from  these 
sources  is  imperfect  due  to  measurement  inaccuracies.    Additionally, 
it  is  desired  to  have  to  resort  to  LORAN  or  other  observations  as 
infrequently  as  possible,  so  optimal  use  of  observational  data  is 
indicated.    These  conditions,  optimal  estimation  from  noisy  obser- 
vations, comprise  justification  for  considering  the  problem  here. 

The  problem  will  be  pursued  here  only  far  enough  to  illustrate 
application  of  the  method.    Thus  only  the  problem  of  estimating 
latitude  error  will  be  discussed  in  detail.    Without  further  theo- 
retical complication,  the  treatment  may  be  expanded  to  encompass 
remaining  aspects  of  the  error  estimation  problem.    Optimal  control 
techniques  may  then  be  employed  in  system  reset,  a  step  under 
current  development  [5  ]  . 
1 .      Background 

As  stated  above,  SINS  employs  gyroscopes  to  stabilize  a  gimbal- 
supported  inertial  platform.    Remarks  to  follow  will  relate  to  the 
four-gimbal  system-1- ,  the  innermost  two  gimbals  of  which  are  termed 
the  "latitude  gimbal"  and  "heading  gimbal."    Upon  the  innermost 
gimbal,  the  latitude  gimbal,  are  mounted  three  stabilizing  single- 
degree-of-freedom  gyros  with  mutually  perpendicular  input  axes. 

■'-The  present  discussion  will  be  only  detailed  enough  to  outline 
the  estimation  problem  to  be  considered.    More  detailed  discussion 
of  inertial  navigation  systems  may  be  found  in  [24  ],   [32  ]. 
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This  basic  structure  will  be  discussed  further  below. 

Two  coordinate  systems  fixed  in  the  gimbals  describe  operation 
of  the  system.    They  have  been  termed  the  "instrumented  equatorial 
coordinate  system",  fixed  in  the  latitude  gimbal  with  axes  E' ,  Y' , 
and  P1 ,  and  the  "instrumented  local  coordinate  system",  fixed  in 
the  heading  gimbal  with  axes  x' ,  Y' ,  and  z1 .    These  axes  correspond 
to  "earth  axes"  E,  Y,  P,  x,  and  z  shown  in  Fig.  2-4.    The  latitude  and 
heading  gimbals  are  pivoted  to  each  other  on  the  Y1  axis. 
Identification  of  axes  shown  in  Fig.  2-4  follows: 

E  is  parallel  to  intersection  of  local  meridian  plane  with 

plane  of  equator,  positive  outward. 
Y  is  in  local  horizontal  plane  and  extends  eastward. 
1?  P  is  parallel  to  polar  axis  of  earth,  positive  toward 

North  pole . 
x  is  in  local  horizontal  plane  and  extends  northward, 
z  is  parallel  to  local  vertical,  positive  downward. 
Having  introduced  the  coordinate  systems  employed  in  SINS, 
orientation  of  the  siabilizing  gyros  may  be  more  completely  described. 
As  stated,  they  are  mounted  on  the  innermost,  or  latitude  gimbal, 
in  which  is  fixed  the  instrumented  equatorial  coordinate  system.    The 
gyros  are  termed  the  equatorial,  latitude,  and  polar  gyros,  and  are 
mounted  with  their  input  axes  parallel  to  the  E' ,  Y1 ,  and  P'  axes 
respectively.    Drift  rates  of  the  three  gyros  are  denoted,  respectively, 
by    €£,  cL,  and  €p. 

In  order  that  the  instrumented  equatorial  coordinate  system 
correspond  accurately  with  the  earth's  equatorial  coordinate  system, 
the  latitude  gimbal  is  torqued  about  the  P'  axis  at  the  rate  Q+  \  where 

Q=  Earth  rate 

X  =  Ship  longitude  rate  obtained  via  an  accelerometer  on 
the  inertial  platform. 
Additionally,  the  heading  gimbal  is  torqued  with  respect  to  the  latitude 
gimbal  at  a  rate    (-L1)    equal  to  the  negative  of  the  SINS  indicated 
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Equatorial  and  Local  Coordinate  Systems 
Fig.  2-4 


Coordinate  System  Misalignment 
Fig.  2-5 
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latitude  rate  (also  obtained  via  an  accelerometer  on  the  inertial 
platform).    In  operation,  the  system  indicated  latitude  is  given  by  the 
angle  between  the  latitude  and  heading  gyros ,  and  indicated  longitude 
is  given  by  the  angle  between  the  latitude  gimbal  and  a  reference. 

This  completes  a  brief  description  of  system  operation. 
2 .      Error  Generation 

For  the  present  discussion,  operation  is  assumed  to  be  in  "damped 
inertial"  mode,  reducing  Schuler  oscillation  errors  to  comparatively 
insignificant  dimensions.    Nevertheless,  it  remains  necessary  to 
estimate  errors  resulting  from  imperfect  system  resets  and  the  gyro 
drifts    cr,  c    ,  and  ep    mentioned  above. 

When  the  system  is  misaligned,  the  instrumented  and  earth  equa- 
torial coordinate  systems  do  not  coincide  as  desired.     Misalignment 
consists  of  small  angular  displacements    0      about  the    Y   axis  and 
0£    about  the    E    axis.    The  situation  is  depicted  in  Fig.  2-5,  where 
the  vector  P1  has  unit  length. 

In  the  absence  of  gyro  drift,  the  P'  axis  is  motionless  relative  to 
inertial  space,  while  the  earth  coordinate  system  rotates  about    P 
at  the  earth  rate.    The  projection  of    P'    in  the  equatorial  plane  thus 
describes  a  circle  centered  at  the  origin  and  repeated  every  24  hours. 
Differential  equations  describing  the  circle  are 

0L    =    -fi0E 

0£       =    0,0-^ 

Introducing  gyro  drift,  0  ^    acquires  an  additional  rate  of  change 
equal  to  cT  ,  the  latitude  gyro  drift.    Similarly,  0E    acquires  an  addition- 
al rate  of  change  equal  to    cE  caused  by  equatorial  gyro  drift.    The 
equations  become 

0*L     =     -fi0E  +  €L 
0E     =     O0L  +  €£ 

If  it  is  assumed  that  the  drift  rates    e,    and  eP    are  constant,  the 
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equations  may  be  written 


d 
dt 

(0L 

+ 

fi 
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-  fi(  0E   - 
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The  locus  is  still  a  circle  but  is  centered  off  the  origin  as  shown  in 

Fig.  2-6. 

3 .      Error  Estimation 

Importance  of  the  error  locus  lies  in  its  relation  to  physically 
measurable  errors.    Under  the  common  assumption  of  perfect  knowledge 
of  the  local  vertical  (i.e.    z1  and    z    vertical),  it  has  been  shown 
[32  ]  that 

0L    -    CL 


0E    =    £  HcosL 


where 


£  L  =    Latitude  error 

£  H  =    Heading  error 

In  general,  only  position  information  is  conveniently  and  regularly 

available.    Hence  only    0      can  be  directly  estimated.    By  use  of 

three  or  more  latitude  error  observations,  the  circle  may  be  determined 

to  within  the    Y    axis  displacement    €L     , 

fi 

Current  operating  procedure  involves  a  manual  graphical  fit  of 
a  24  hour  period  sine  wave  through  plotted  succeeding  values  of 
latitude  error  versus  time.    The  method  is  patently  both  time  con- 
suming and  vulnerable  to  human  error.    In  the  discussion  below, 
this  same  reduced  problem  will  be  treated  with  a  recursive  least 
squares  procedure  (Kalman  filter).    While,  as  stated,  no  further 
theoretical  complication  is  involved  if  treatment  is  extended  to  the 
complete  problem,  all  essential  features  needed  to  show  the 
application  are  present  in  the  reduced  problem  unobscured  by 
arithmetic. 
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Error  Locus  with  Constant  Gyro  Drifts 
Fig.  2-6 
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Linear  Model  for  Latitude  Error 
Fig.  2-7 
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4.      Recursive  Least  Squares  Error  Estimation 

In  order  to  employ  the  Kalman  filter,  it  is  necessary  to  specify  a 
model  for  the  observed  process  in  terms  of  a  linear  dynamic  system 
excited  by  white  noise.    Furthermore,  it  is  necessary  to  possess  a 
priori  information  about  the  amount  of  noise  contaminating  obser- 
vations and  about  the  mean  squared  value  of  the  observed  process. 

Since  only  estimation  of  latitude  error  (which  equals  ^T )  is  to 
be  considered  in  the  reduced  problem,  a  simple  form  for  the  modeil 
may  be  written  by  inspection.    It  appears  in  block  diagram  form  in 
Fig.  2-7.    The  error  locus  of  Fig.  2-6  may  be  considered  to  be  pro- 
jected upon  the  E  axis.    Hence  error  estimation  in  the  reduced 
problem  amounts  to  estimation  of  the  unknown  amplitude,  phase, 

and  offset    (  -  €E  ) .    The  singular  model  of  Fig.  2-7  has  a  null  G 
Q 

matrix,  and  its  behavior  is  thus  determined  solely  by  the  unknown 

initial  conditions.    In  Fig.  2-7,  the  pure  integrator  leg  preserves  the 

initial  offset;,  and  the  sfecorid  order  leg rre presents  the  24  hour  period 

sine  wave  of  unknown  amplitude  and  phase.    Using  this  model,  a 

priori  quantities  used  in  the  filter  are    x(l/0)  =  0    and    P(l/0)  =  10,0001. 

The  model  of  Fig.  2-7  was  converted  to  discrete  form  suitable  for 

digital  computation.    The  resulting  matrices  describing  it  are  given 

below  (for    T  =  1  hour) . 


r  = 


*   = 


1 

0 

0 

1 

0 

.9659 

.9886 

H  = 

1 

0 

-.0678 

.9659 

0 

In  order  to  obtain  a  priori  quantities  in  the  alternate  manner 
noted  above,  the  model  of  Fig.  2-7  may  be  made  absolutely  stable 
by  the  addition  of  damping  in  each  leg.    The  precise  24  hour  period 
of  oscillation  and  essentially  constant  offset  (constant  drift  rate) 
suggests  that  only  slight  damping  be  used.    Then  a  value  of    q    , 
variance  of  the  white  excitation  noise,  is  specified  such  as  to 
obtain  steady  state  peak  oscillation  of  about  5-10  minutes  of  arc, 
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a  rough  mean  peak  oscillation  observed  in  practice. 

In  the  simulation,  latitude  measurements  were  assumed  available 
every  three  hours,  and  were  all  assumed  to  be  of  the  same  quality. 
Assumed  standard  deviation  of  measurement  noise  was  five.    It  was 
assumed  that  an  updated  latitude  error  estimate,  with  its  corres- 
ponding variance,  was  desired  hourly.    Hence  updating  without  data 
is  done  in  the  intervening  hours  between  receipt  of  measurements. 

Filter  computations  are  straightforward,  and  need  not  be  elab- 
orated upon.    Results  of  the  simulation  are  displayed  in  Figs.  2-8, 
2-9  for  the  unexcited  model,  and  Figs.  2-10,  2-11  for  the  stable 
model.    Comparison  of  the  resulting  estimates  shows  that  both 
models  achieved  approximately  the  same  estimation  accuracy.    The 
simulation  program  listing  is  included  in  Appendix  III. 
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Fig.  2-8 
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Estimated  and  Actual  Latitude  Error  -  Unexcited  Model 

Fig.  2-9 
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CHAPTER  III 

MODELS  FOR  RANDOM  TIME  SERIES 

I.     GENERAL 

The  purpose  of  this  chapter  is  to  treat  some  valid  questions  which 
arise  when  application  of  Kalman  estimation  techniques  is  considered 
for  problems  in  random  signal  detection  and  pattern  recognition.    In 
such  problems,  the  nature  of  possible  random  time  series  to  be  pro- 
cessed becomes  quite  general,  and  some  uncertainties  faced  are: 

(i) .         Is  it  still  a  justifiable  assumption  that  the  time  series 
to  be  processed  is  the  realization  of  a  Gaussian  process? 

(ii) .        Can  it  be  expected  that  an  acceptable  finite-parameter 
representation  of  the  signal  process  may  be  found? 

(iii) .      What  finite-parameter  schemes  have  been  considered  in 
the  literature  on  time  series  analysis  ?- 

(iv) .       What  schemes  are  available  for  obtaining  the  various 
finite-parameter  representations  ? 

In  considering  further  the  questions  above,  it  will  be  helpful 
to  review  pertinent  work  done  in  the  field  of  time  series  analysis. 
In  recent  years,  works  of  Tukey  [4  ],  Grenander  and  Rosenblatt 
[11,  12  ],  Parzen  [25  ],  and  Wold  [39  ]  are  representative  and 
contain  references  to  additional  literature  on  the  subject.    The  central 
problem  in  time  aeries  analysis  is  the  following:    one  is  allowed  to 
observe  a  time  series,  a  partial  realization  of  a  stochastic  process, 
and  on  the  basis  of  this  observation  is  required  to  make  statistical 
inference  about  the  time  series.    In  general,  the  required  inference 
concerns  the  possible  mechanism  generating  the  time  series  or  pre- 
diction of  future  behavior  of  the  series  . 

Work  done  to  the  present  in  time  series  analysis  has  primarily 
dealt  with  single  time  series 1,  resulting  from  stationary  (and  ergodic) 

It  is  only  the  single  time  series,  i.e.  the  single-output  signal 
generating  process,  which  it  is  desired  to  consider  here.    Certain 
work  has  been  done  in  the  analysis  and  model  determination  of 
multivariate  time  series,  and  may  be  found  in  [3  ],   [40  ],   [30  ]. 
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random  processes  with  finite  first  and  second  moments.    In  fact,  the 
great  majority  of  effort  in  the  analysis  of  random  time  series  has  been 
directed  toward  spectral  analysis  or  correlation  analysis,  employing 
only  the  first  and  second  moments  of  the  generating  process.    This 
has  come  about  for  two  reasons.    First,  it  is  possible  to  accomplish 
a  great  deal  with  only  second  order  quantities,  and  second,  the  use 
of  higher  order  moments  introduces  as  yet  intractable  difficulties 
[13  ].    Thus  the  theory  which  has  been  developed  has  been  tailored 
to  fit  the  methods  used.    Of  course  in  applying  the  theory,  it  is 
necessary  to  keep  in  mind  that  there  are  time  series  about  whose 
statistical  behavior  only  limited  information  is  conveyed  by  the 
correlation  or  spectrum.    Two  time  series  may  have  the  same  auto- 
correlation function  or  power  spectral  density,  yet  possess  quite 
different  statistical  properties.    A  classic  example  is  provided  by 
the  so-called  random  telegraph  signal  and  the  output  of  a  proper 
low-pass  filter  excited  by  Gaussian  white  noise. 

From  the  above  discussion  and  the  fact  that  the  Kalman  tech- 
nique employs  linear  processing,  the  answer  to  (i)  is  clear.    As 
long  as  the  processing  scheme  to  be  used  employs  only  the  first 
two  moments  of  the  time  series  to  be  analyzed,  then  without  loss  of 
generality,  the  subject  time  series  may  be  replaced  by  any  time  series 
having  the  same  autocorrelation  function  [6  ].    In  particular,  it  may 
be  replaced  by  a  Gaussian  time  series,  a  sample  function  of  a 
Gaussian  random  process,  which  is  completely  described  by  its 
first  and  second  moments . 

Remaining  sections  of  this  chapter  will  deal  with  the  questions 
of  finite  parameter  representations  of  time  series  and  methods  of 
estimating  the  parameters. 

II.     FUNDAMENTAL  MODELS  FOR  RANDOM  TIME  SERIES 

As  stated  above,  the  principal  matter  of  concern  in  time  series 
analysis  is  the  making  of  inferences  about  the  structure  of  a  random 
process     {y(t)  }   based  on  observation  of  a  partial  realization  of  the 
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process,  a  time  series    y(l),  y(2),    .    .    .  y(N).    In  general,  analysis 
requires  a  mathematical  model  of  the  time  series.    For  practical  reasons, 
it  is  highly  desirable  that  the  model  be  of  finite-parameter  form,  and 
moreover,  that  it  be  of  as  low  an  order  as  possible. 

Since  attention  here  is  being  restricted  to  linear  processing 
schemes,  for  which  as  seen  above,  the  random  series  may  be  adequate- 
ly represented  by  its  first  and  second  order  moments,  use  in  analysis 
of  either  the  power  spectrum  or  autocorrelation  function  is  equally 
valid.    For  investigating  modeling  potentialities,  it  turns  out  to  be 
more  convenient  to  deal  with  the  power  spectrum  than  the  covariance 
sequence.  *■ 

In  1938,  Wold  [38  ]  s how 4dl  that  for  the  discrete  time  case, 
there  exists  a  non-decreasing  bounded  function    3  (o$  ,  defined  for 
-tt  £  a)  ^  tt  ,  such  that 

7T 


=  1 


R(t)=   J    e^Td3(u)  T   =    0,  ±1, 

The  function    3  (  lc)    is  called  the  Spectral  Distribution  Function  of  the 
time  series.    This  theorem  by  Wold  corresponds  to  that  of  Khintchine 
in  1933  for  the  continuous  case,  where  he  showed  that  there  exists 
a  non-decreasing  bounded  function    3  ( w)  ,  defined  for  -®£   u>  s   °°  / 
such  that 


R(t)  =   J  ej  WT  d  3(o>) 


3  (u>)    can  be  uniquely  written  [5  ]  as  the  sum 
3  (u>)  =  3ac(u>)    +    3  ft(  w)    +    Bsc(a>) 
where  the  three  distribution  functions  have  the  following  properties. 

The  work  here  will  deal  with  discrete-time  random  time  series. 
Though  the  supporting  theory  has  been  developed  for  the  continuous 
time  case,  it  is  considerably  more  complicated  [3  ]  than  that  for 
discrete-time  sequences.    Besides,  for  engineering  purposes,  useful 
results  in  the  continuous  case  may  often  be  had  by  taking  discrete- 
time  results  to  the  limit. 
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3ac(u>)    is  absolutely  continuous  and  is  the  integral  of  a  non-negative 
function    F(oj)    called  the  Spectral  Density  Function  or  Power  Spectral 
Density  of  the  time  series.    The  function    3d(a0    is  a  discrete  func- 
tion which  represents  the  contribution  of  power  at  discrete  frequencies 
to  the  Spectral  Distribution  Function  of  the  time  series.    It  may  be  rep- 
resented as 

3d(u>)    =    I  A  3(u>i) 

where 

A  3  (wi)    =    3  (o>i  +  0)    -    3  (wi  -  0) 

The  remaining  constituent  function    3sc(u>)    is  a  singular  continuous 
function,  constant  except  on  a  set  of  Lebesgue  measure  zero  [11  ]  . 
It  has  been  asserted  [13  ]  that  this  part  does  not  appear  to  be  mean- 
ingful observationally,  and  it  is  commonly  neglected  in  the  literature. 

Having  noted  the  elemental  parts  of  the  general  Spectral  Dis- 
tribution Function    3  (to)/  it  will  be  constructive  to  examine  models 
which  have  been  proposed  for  time  series  and  to  consider  their 
capacities  for  representing  the  various  constituent  functions. 
Method  of  Hidden  Periodicities 

One  of  the  earliest  models  in  the  history  of  time  series  analysis, 
the  so-called  scheme  of  hidden  periodicities  was  introduced  in  1898 
by  Schuster  in  connection  with  meteorological  studies  .    The  model 
assumes  that  the  observed  random  sequence    {  y(k)  }   may  be 
represented  as  the  sum  of  a  mean  value  function    m(k)    and  white 
noise    v(k) 

y(k)  -  m(k)  +  v(k) 

where  the  mean  value  function    m(k)    represents  a  systematic 

oscillation 

m 

m(k)    =    I    A.    cos(o),k    +    0  .  ) . 
i=l      iii 

In  the  model,  the  amplitudes    A,  ,  angular  frequencies    &.,  and 
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phase  angles    0.    are  constants,  some  of  which  are  known  and  some 
to  be  estimated.    The  samples  of  white  noise    v(k)    have  the  properties: 

E[v(k)]   =    0  E[v(k)vO)]   =    {f       *T"J 

Obviously,  the  spectral  distribution  function  of  any  signal  gen- 
erated by  such  a  harmonic  model  would  correspond  only  to  the  dis- 
crete spectral  distribution  function  above.    But  time  series  observed 
in  nature  generally  possess  at  least  a  mixed  spectrum  [11  ]  ,  whose 
spectral  distribution  function  contains  both  the  discrete  and  abso- 
lutely continuous  components.    Furthermore,  to  a  good  approximation, 
many  physically  observed  time  series  may  be  considered  to  have 
spectra  corresponding  only  to  the  absolutely  continuous  spectral 
distribution  function    3ac(oj).    Hence,  as  might  now  be  predicted 
by  the  theory  (still  nonexistent  at  the  time  the  scheme  of  hidden 
periodicities  was  introduced),  such  a  harmonic  model  did  not  perform 
satisfactorily  for  many  time  series  analysis  problems  [39  ]. 
Linear  Models 

Of  greater  practical  interest  are  the  various  linear  models  which 
have  been  proposed.    The  spectra  of  such  models  correspond  to  ab- 
solutely continuous  spectral  distribution  functions.    This  class  of 
models  includes  the  Autoregressive  and  Moving  Average  schemes 
introduced  in  the  late  1920's  by  Yule  and  Slutzky,  and  the  linear 
filter  excited  by  white  Gaussian  noise  used  by  Rice  [29  ].    Models 
of  the  class  are  defined  below,  and  techniques  for  estimating  their 
parameters  are  considered  in  the  next  section. 
Autoregressive  Process 

The  current  value  of  the  time  series  is  represented  as  the  sum 
of  a  linear  combination  of  the  past    n    values  of  the  time  series  and 
an  uncorrelated  disturbance    w(k)    ,  where    n    is  the  order  of  the 
process  . 
In  difference  equation  form,  the  autoregressive  process  is 

y(k)  =  b^Ck-1)  +  b2y(k-2)  +  .   .   .  bny(k-n)  +  w(k) 
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where  the  disturbance    w(k)    has  the  properties , 

E[w(k)]=0  E[w(k)w(j)]=/a2    k  =  j 

k  M 


% 


A  necessary  and  sufficient  condition  that  the  model  be  convergent  is 
that  all  roots    z^    of  the  polynomial 

z       +    b  ,  z   ~     +...b      ,     z  +  b 
1  n-1  n 

lie  within  the  unit  circle. 
Process  of  Moving  Averages 

For  a  model  of  order   m,   the  current  value  of  the  time  series  is 
represented  as  a  linear  combination  of  the  past    m    values  of  the 
uncorrelated  disturbance.    In  eguation  form, 

y(k)  =  a     w(k)  +  a     w(k-l)  +  .   .   .    a     w(k-m) 
0  1  m 

where    w(k)    is  defined  as  for  the  autoregressive  model  above. 
Linear  Filter  Model 

In  discrete-time  formulation,  this  model  takes  the  form 

x(k)  =  $x(k-l)  +  T  w(k) 

y(k)  =  Hx(k) 
where 

x(k)  =  Vector  of  states  at  time    k    of  the  linear  dynamic  system 
comprising  the  filter. 

$  =      State  transition  matrix  of  the  linear  dynamic  system. 

r  =       Vector  which  distributes  the  excitation  noise  across  states 
of  the  filter. 

H    =     Observation  vector,  which  relates  the  scalar  observable 
to  states  of  the  filter. 

y(k)  =  Value  of  the  random  time  series  at  time    k. 

w(k)  =  White  Gaussian  excitation  noise  as  defined  above. 

Models    a    and    b    above,  on  the  one  hand,  and  model    c    on  the 
other,  have  been  employed  by  different  groups  of  researchers  for 
seemingly  very  different  problems  .    Models    a    and    b,  though  initially 
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proposed  for  studies  of  sunspot  activity,  have  in  recent  years  been 
most  extensively  employed  in  econometrics  research  and  have  to  a 
degree  come  to  be  associated  with  that  field  of  study.    Conversely, 
the  linear  filter  model  has  been  studied  almost  exclusively  by  the 
engineering  community  concerned  with  random  signal  representation 
and  stochastic  control  problems.    It  is  important,  however,  to  note 
their  basic  similarities. 

The  picture  may  be  cleared  by  noting  the  capacity  of  each  model 
to  approximate  the  power  spectral  density  of  a  time  series  to  be 
modeled.    As  stated  above,  the  spectra  of  linear  models  correspond 
to    3ac(a))/  the  absolutely  continuous  spectral  distribution  function. 
The  related  power  spectral  density    F(u>)  may  be  approximated  ar- 
bitrarily closely  by    a  rational  function  in    oj      of  finite  order  [12  ]. 
Now  in  the    z    domain,  for  a  spectral  density  which  is  approximately 
zero  beyond  the  appropriate  Nyquist  frequency,  there  is  a  corres- 
ponding spectral  density  function  rational  in    z,  so  that 

A(z)=l^- 
Q(z) 

where    P    and    Q    are  polynomials . 

Since  the  present  discussion  concerns  discrete-time  analysis, 
it  is  the  relation  between  the  linear  models  above  and    A(z)    which 
is  of  interest.    Also,  since  the  time  series  to  be  dealt  with  are  real, 
A(z)    may  be  written  as  follows  for    z    on  the  unit  circle: 

A(z)  =  -PfeL    =     A{z)A(8) 
Q(z)  B(z)B(i) 

z 


where 


A(z)  =a   zm  +  a   zm_1    +  .   . 


a 


0  1  m 


^*  \      i        n      ,        n-i  . 

g(z)  =  b  z      +  b  z  +  .   .   .  b 

0  1  n 


Whittle  [36  ]  has  shown  that  a  rational  spectral  function  where    A    has 
order    m    and    B    has  order    n    always  corresponds  to  a  process  structure 
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bx(k)    +    b  x(k-l)    +  .    .   .    b  x(k-n) 
0  1  n 

=  a  w(k)  +  a  w(k-l)  +  .   .   .  a    w(k-m) 
0  1  m 

Such  a  model  structure  has  been  termed  a  mixed  autoregressive- 
moving  average  model,  the  obvious  title. 

From  the  discussion  above,  relationships  between  the  linear 
models  mentioned  and  the  power  spectral  density  is  clear.    The  linear 
filter  model  and  mixed  autoregressive-moving  average  model  are 
equivalent  and  provide  a  rational  approximation  to  the  power  spectral 
density.    A  moving  average  model  of  order    m    produces  an    mth  order 
polynomial  approximation  to  the  spectral  density  function,  and  an 
n"1    order  autoregressive  model  produces  an    n       order  inverse  poly- 
nomial approximation  to  the  spectral  density  function. 

This  section  has  discussed  questions  (ii)  and  (iii)  raised  in 
section  I  .    A  remaining  question  of  key  practical  importance  is 
parameter  estimation  for  the  model  selected.    It  will  be  considered 
in  brief  detail  next. 

III.     ESTIMATION  OF  LINEAR  MODEL  PARAMETERS 

It  has  been  suggested  above  that  linear  models  have  valuable 
practical  utility  for  representing  time  series.    Final  employment  in 
real  applications,  however,  rests  upon  the  ability  to  estimate  the 
necessary  model  coefficients.    The  problem  differs  quite  significantly 
from  the  problem  of  plant  identification  in  control  system  engineering. 
In  the  present  case,  inputs  to  the  filter  are  known  only  in  statistical 
terms,  and  as  a  matter  of  fact,  are  not  physical  quantities  at  all, 
but  are  fictitious . 

In  the  section  above,  the  linear  filter  and  mixed  autoregressive- 
moving  average  models  were  asserted  to  be  equivalent.    It  is  there- 
fore helpful  to  express  the  linear  filter  in  such  a  form  that  corres- 
pondence between  parameters  of  the  two  models  may  be  easily  shown. 
The  "standard  canonical  form"  of  [21  ]  is  such  a  form.    It  is  obtained 
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through  a  linear  transformation  of  the  state  variables,  and  involves 
no  alteration  of  the  filter  input-output  relation.    The  observability- 
requirement  for  such  a  transformation  is  no  restriction  in  the  present 
situation.    In  the  "standard  canonical  form",  the  three 'matrices  de- 
scribing the  filter  input-output  relation  have  the  form: 


r    = 


'i 


^2 


<|    = 


-b. 


n 


-b2"bl 


H  = 


1 


After  normalizing  coefficients  of  the  mixed  autoregressive- 
moving  average  model  to  obtain    bn    equal  unity,  remaining  auto- 
regressive  coefficients    b.    directly  equate  to  elements  of  the    $ 
matrix  bottom  row  as  shown.    Equation  of  moving  average  coefficients 
a.A     with  filter  parameters  is  not  as  direct  and  is  given  below. 


m 


1 

b. 


>n-l 


>1 


1 


y 


n 


(m=n-l) 


The  relation  above  may  be  obtained  either  by  generalizing  from  low 
dimensional  systems  or  deriving  directly  for  the  general  case  [14  ]. 

No  general  techniques  presently  exist  for  estimating  the  para- 
meters a.    and  b.    of  a  linear  filter  in  a  straightforward  manner  from 
observed  data.    However,  if  the  time  series  can  be  considered  to  be 
sufficiently  represented  by  a  moving  average  model,  then  the 
following  relations  may  be  solved  to  obtain  the  coefficients  of  the 
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polynomial    A(z). 


a0al 
0  6Q 

0 


m 


am-l 


a0al 


0 


0  a 


0 


[V 

'r(O)" 

al 

r(D 

• 
• 

• 

• 

l  mJ 

• 

r(m)_ 

where    t{t)    are  values  of  the  observed  autocorrelation  function  of 

lag    t.    As  it  happens,  there  are  2m    possible  ways  in  which  zeros 

of  the  product   A(z)A(I  )    may  be  assigned  to    A    to  obtain  the  same 

z 

power  spectral  density  (and  autocorrelation  function) .    Hence  there  is 
an  indeterminacy  of  order    2m    in  the  resulting  coefficients  of  A(z). 
This  indeterminacy  may  be  resolved  by  using  the  fact  that  the  time 
series  is  real  and  by  forcing  all  zeros  of  A(z)    to  lie  on  or  within  the 
unit  circle.    The  net  effort,  however,  has  been  sufficient  to  dis- 
courage use  of  moving  average  models  of  very  high  order.    It  is  a 
useful  observation  to  note  that  the  autocorrelation  function  of  a 
series  generated  by  a  moving  average  model  is  zero  for  lags  greater 
than    m. 

Estimation  of  parameters  for  an  autoregressive  model  is  more 
straightforward.    The  problem  is  to  obtain  a  least  squares  fit  for 

the  coefficients    b.    in  the  relation 
i 


n 

L    b  y(k-i)  =w(k) 

1-0    x 


b=l,       k  =  1 ,  2 ,    . .  .  N 


subject  to  the  restriction  that  the  roots  of    B(z)    should  fall  within 
the  unit  circle.    The  results  of  such  an  approach  are  the  consistent 
set  of  linear  equations  [13  ]    (bQ  =  1),  where  the  r(r)  are  values  of 
the  observed  correlation  function  of  lag    T. 
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r(l)      r(0) 
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r(l)  r(0)  . 

.b 

r(l) 


r(n) 


An  alternate  approach,  a  recursive  approach,  is  also  possible  for 
estimating  autoregressive  model  parameters.    Least  squares  estimation 
problems  with  uncorrelated  residuals  can  be  conveniently  handled 
within  the  framework  of  the  Kalman  filter  [10  ].    Such  an  approach 
has  been  employed  by  Ho  and  Lee  [16  ]  to  estimate  the  coefficients 
of    B(z).    From  the  autoregressive  model  definition, 


where 


Let 


y(k)  =  0]y(k-i)  +  <^y(k-2)  + 


0  ,    =    -b  ,    ,     0  o   =     ~b2       etC 


T 

0     =  (  0  j_  0  2»    •    •    •   0  n) 


y,   =  (y(k-l)        y(k-2)  ....    y(k-n)) 
— k 


0ry(k-n)  +  v(k) 


Then 


y(k)  =£,    0     +  v(k) 


k^ 


For  the  corresponding  Kalman  filter,  the  required  quantities  are: 
T 


T  q  r  x  =  0 

3  =1 


X  =  0 


z(k)  =  y(k) 


H  =  h=y_k      R  =  r  =  E[v(k)2] 


At  a  glance,  it  is  clear  that  the  computations  required  are  simple. 
Since  the  filter  output  is  a  scalar,  no  matrix  inversion  is  required. 
Further,  gain  computations  are  simplified  here  since    P(k/k-l)  = 
P(k-l/k-l).    It  may  first  appear  that  ignorance  of    r    will  present 
difficulty.     However,  normalizing  the  weighting  computations  with 
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respect  to    r    and  employing  the  suboptimal  but  practically  useful 
device  [21  ]  of  a  very  large  P(l/0)  avoids  the  problem. 

The  recursive  approach  is  quite  flexible.    If  it  is  desired  to 
estimate  the  parameters  of  the  best  autoregressive  model  for  the 
observed  time  series,  each  sample  in  turn  is  processed.    On  the  other 
hand,  if  it  is  required  to  estimate  the  denominator  coefficients  of  a 
linear  filter  having  a  numerator  of  order    m,  then  processing. only 
every    m+1    samples  guarantees  uncorrelated  residuals  (recalling 
that  correlation  in  a  moving  average  model  is  nonzero  only  over    m 
lags) .    Then  the  numerator  coefficients  may  be  obtained  by  pro- 
cessing the  residuals  left  after  treating  the  observations  with  the 
denominator  in  an  autoregressive  fashion. 

The  approach  of  Ho  and  Lee  has  been  used  in  experiments  to 
estimate  the  model  coefficients  for  an  autoregressive  representation 
of  a  random  signal.    Results  are  given  in  the  next  section. 
IV.     EXAMPLES  OF  MODEL  DETERMINATION 

Various  facets  of  the  discussion  above  may  be  made  clearer  by 
an  illustrative  example.    Recursive  parameter  estimation  was  employed 
to  obtain  an  autoregressive  model  for  a  contrived  laboratory  signal. 
The  model  to  be  found  was  arbitrarily  chosen  to  be  of  sixth  order. 

The  contrived  signal  consisted  of  the  additive  combination  of 
a  50  cycle  sine  wave  and  the  output  of  a  laboratory  random  noise 
generator.    Originally  generated  in  analog  form,  the  signal  was 
sampled  by  an  analog-to-digital  converter  under  control  of  a  CDC 
160  computer,  and  the  digital  samples  were  stored  on  magnetic 
tape  in  4000-sample  blocks.       Since  initial  analysis  disclosed  little 
noise  energy  above  approximately  400cps,  the  sampling  interval  was 
effectively  increased  from  the  original  400  jas  to  1200  jis  by  using 
only  every  third  sample,  giving  a  Nyquist  frequency  of  416  cps  .    Of 
course,  the  resulting  sample  size  used  in  obtaining  the  model  was 

*The  very  useful  programs  used  to  accomplish  the  digitizing/ 
recording  operation  and  necessary  subsequent  unpacking  are  due  to 
N.  Barrett  [2  ]. 
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reduced  to  a  third  of  the  original  4000  samples. 

Two  runs  were  made,  using  signal-to-noise  ratios  of  approx- 
imately   0  db    and    -10  db.    Results  are  displayed  in  Figs.  3-1  to 
3-6.     Plots  of  z-plane  poles  of  the  resulting  models  are  given  in 
Figs.  3-1  and  3-4.    The  remaining  figures  for  each  run  contain  com- 
parative plots  of  autocorrelation  functions  and  power  spectral  den- 
sities derived  from  the  signal  and  model.    A  visual  indication  of 
model  efficiency  is  provided  by  the  comparative  power  spectral 
density  plot. 
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Model  Poles  -  50  cps  Sine  Wave  in  Random  Noise 
0  db  Signal/Noise  Ratio 
Fig.  3-1 
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Autocorrelation  Functions  Computed  from  Data  and  Model 

S/N  =  0  db 
Fig.  3-2 
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Power  Spectral  Densities  Computed  from  Data  and  Model 

S/N  =  0  db 
Fig.  3-3 
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Fig.  3-4         Model  Poles  -  50  cps  Sine  Wave  in  Random  Noise  - 

-10  db  Signal/Noise  Ratio 
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Fig.  3-5     Autocorrelation  Functions  Computed  from  Data  and 
Model  -  S/N  =  -10  db 
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Fig.  3-6     Power  Spectral  Densities  Computed  from  Data 
and  Model  -  S/N  =  -10  db 
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CHAPTER  IV 
DETECTION  OF  RANDOM  SIGNALS  IN  NOISE 

The  problems  of  detecting  both  deterministic^and  random  signals 
in  the  presence  of  additive  Gaussian  noise  have  received  considerable 
attention  in  the  literature  [14  ],   [22  ],   [23  ],  [34  ],  [37  ].    It  is 
well  known  that  for  the  former  case,  the  optimum  receiver  is  a  cross 
correlatbFii  or  matched  filter,  which  correlates  the  input  with  a  replica 
of  the  transmitted  signal. 

Random  signals  considered  previously,  and  to  be  considered  here, 
have  been  assumed  to  be  of  the  somewhat  restrictive  but  tractable 
Gaussian  form  with  zero  mean  and  known  autocorrelation .     (This 
assumption  will  be  discussed  in  greater  detail  below.)    Kailath  [18  ] 
has  shown  that  the  optimum  receiver  for  such  Gaussian  signals  in 
Gaussian  noise  forms  a  cross  correlation  of  the  input  and  a  least 
squares  estimate  of  the  signal  based  on  fcnput  received  in  the  inter- 
val under  scrutiny. 

The  results  obtained  in  [18  ]  suffer  practical  drawbacks,  however. 
Critical  computational  difficulties  are  faced  in  obtaining  the  least 
squares  estimate  when  the  number  of  samples  considered  is  large, 
as  it  is  required  to  invert  a  matrix  of  the  same  dimension  as  the 
number  of  samples.    Moreover,  statistical  descriptions  of  possible 
signals,  which  were  assumed  known,  are  in  practice  not  easy  to 
obtain. 

The  present  work  approaches  this  problem  in  a  manner  somewhat 
similar  to  that  of  Weaver  [35  ],  and  employs  the  results  of  Schweppe 
[31  ].    The  serious  computational  difficulties  noted  above  are 
avoided.    Specifically,  estimation  techniques  developed  in  recent 
years  by  Kalman  and  others  [19  ],  [20  ],   [26  ],   [27  ]  are  employed. 
Furthermore,  in  view  of  the  similarities  between  correlation  signal 

In  the  discussion  to  follow,  a  deterministic  signal  will  be 
defined  as  one  whose  form  is  completely  known  at  the  receiver,  and 
a  random  signal  will  be  one  whose  description  is  known  at  the 
receiver  only  in  statistical  terms. 
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detection  and  linear  discrimination  methods  in  pattern  recognition,  it 
is  felt  that  the  approach  to  be  followed  here  may  constitute  a  new  and 
more  powerful  approach  to  certain  pattern  recognition  problems  . 

For  simpler  and  clearer  presentation  of  results,  the  analysis  to 
follow  will  treat  the  discrete,  or  sampled,  situation.    However,  the 
techniques  to  be  employed  are  equally  applicable  to  continuous  pro- 
cessing . 

I.     OUTLINE  OF  THE  PROBLEM 

The  problem  to  be  considered  is  the  following:    a  signal  con- 
sisting of  a  finite  number    N    of  samples  of  a  Gaussian  process 
y(t)    is  received  during  a  limited  time  interval    T    in  the  presence  of 
additive  Gaussian  noise.    The  additive  noise  and  signal  process  are 
assumed  independent.    In  practice,  real  time  processing  is  necessary, 
and  it  is  required  to  announce  at  the  end  of  the  interval  whether  or 
not  the  signal    y(t)    has  occurred.    It  will  be  seen  later  that  the  present 
limitation  to  a  single  signal  is  not  restrictive.    The  same  approach 
applies  to  the  problem  of  detecting  and  discriminating  members  of  a 
finite  set    {ym(t)}   of  signals. 

A  word  about  the  assumption  of  Gaussian  signals  is  in  order. 
Aside  from  transmitted  signals  whose  forms  are  simply  not  known  at 
the  receiver  except  in  terms  of  their  first  and  second  order  statistics, 
it  happens  that  the  scatter -multi  path  structure  of  ionospheric  prop- 
agation perturbs  transmissions  into  Gaussian  signals  [27  ].    The 
same  situation  is  produced  in  radar  and  sonar  by  clutter  and  rever- 
beration resulting  from  the  effects  of  a  multitude  of  small  random 
scatterers . 

II.     CORRELATION  DETECTION  OF  DETERMINISTIC  SIGNALS 

It  will  be  convenient  to  first  consider  the  problem  of  correlation 
detection  of  deterministic  signals  in  order  to  establish  notation 
conventions,  more  clearly  outline  the  problem,  etc. 

From  the  problem  statement  above,  the  task  at  hand  is  clear. 
The  received  signal,  a  mixture  of  the  transmitted  signal  and  additive 
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white  noise,  must  be  processed  to  yield  an  indication  of  which  trans- 
mitted signal  in  a  finite  set  of  possible  signals  {ym(t)}  was  received. 
It  is,  of  course,  desired  that  the  indication  be  as  reliable  as  possible 
under  the  circumstances. 

The  essential  feature  of  the  problem  is  extraction  of  information 
about  the  transmitted  signal  from  the  mixture  of  transmitted  signal  and 
additive  noise.    Information  concerning  the  transmitted  signal  is 
clearly  more  germane  to  the  issue  than  is  determination  of  signal-to- 
noise  ratios  [37  ],  though  the  information  sought  and  signal-to-noise 
ratio  may  often  be  monotonically  related.     Possessing  as  much  infor- 
mation about  the  transmitted  signal  as  it  is  possible  to  isolate  from 
the  received  signal  available  for  processing,  one  may  proceed  with 
the  required  decisions.    The  decision  making  will  commonly  also  con- 
sider other  factors,  such  as  relative  penalties  for  incorrect  decisions, 
any  a  priori  information,  etc. 

As  mentioned,  only  a  mixture,  here  called    z(t),  of  the  trans- 
mitted signal    yi(t)    and  noise  v(t)    is  available  for  processing,  i.e. 
z(t)  =  yi(t)  +  v(t).    The  noise    v(t)    is  assumed  to  be  white  Gaussian 
noise  with  mean  zero  and  known  variance    r.    Since  the  discrete  case 
is  being  considered  here,  what  is  available  for  processing  is  a  se- 
quence of  samples    z(k)    of    z(t),  where    z(k)  =  yx(k)  +  v(k). 

At  this  point,  it  will  simplify  notation  if  we  define  vectors    y1 
and    z ,  where 

yi  =  yi(l),  yi{2) yi(N) 

z  =  z(l),  z(2) z(N) 

From  the  problem  statement,  it  can  be  seen  that  what  must  be 
performed  is  a  multiple-alternative  hypothesis  test  [22  ].    Stated 
another  way,  as  viewed  by  the  receiver,    y    is  a  discrete  random 
variable  having  the  possible  values    y1     i  =  1,  m.    All  information 
at  the  receiver  about  the  transmitted  signal    y    is  contained  in  the  con- 
ditional probabilities  of    y    given  the  observations    z,  p(yVz)  [37  ]. 
It  is  these  conditional  probabilities  which  will  be  considered  further. 
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Remaining  aspects  of  the  decision-making  problem  (relative  costs 
of  errors,  etc.)  depend  upon  the  specific  situation. 

The  conditional  probabilities    p(yVz)    may  be  stated  in  terms  of 
given  or  easily  determined  quantities  through  use  of  Bayes  Formula 
[17], 

p(yi/z)  =    p(z/yi)p(yi) 
p(z) 

On  the  right  hand  side,    ply1)    is  the  a  priori  probability  of  signal 
y1    (assumed  given).    p(z)    is  not  a  function  of  the  transmitted  signal 
yi  and  may  be  considered  a  normalizing  factor.    The  remaining  quantity, 
pfe/y1)    is  a  function  both  of  the  transmitted  and  received  signals,  and 
must  be  considered  in  detail. 
Expanding  the  notation, 

p(z/yi)  =  p[z(l),  z(2)   .   .   .    .z(N)/yi(l),  yi(2)   .    .   .    .y    (N)] 

Due  to  whiteness  of  the  additive  noise,  consecutive  samples  of 
n(t)    are  uncorrelated .     Hence 

z(l)/yMl),  yM2)  .  .  .  y1  (N)  =  z(i)/yi  (l)  ~  NLyMl),  r] 
z(2)/yi(D,  yM2)  .    .   .yMN)  =z(2)/yi(2)    ~  N[yi(2),  r] 

etc. 

Furthermore,    zOJ/y1^)    and  z(k)/y1(k)    are  independent  random 

variables  for  all    j  ^k.    Therefore, 

p[z(l),  z(2)   .   .    ^(Nj/yMD,  yM2)   .    .   .  yMN)] 

=    f]      pEzO/y^)] 

j=l 

In  more  compact  form, 

_N  N 

p(z/yM  =  (2  nr)    2    exp  -  1_      L    [z(j)  -  y1  (j)]  Z 

2r     j=i 
x  N  N  N 

=  C  exp  --LIE     z(j)2  +    I   yi(j)2  -  2  I     zOy^l 
2rtj=l  j=l  j=l  J 
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Inside  the  brackets,  the  first  term  represents  received  signal  energy, 
a  factor  which  will  cancel  in  later  comparisons  of  conditional  prob- 
abilities.   It  can  therefore  be  incorporated  into    C,    The  second  term 
represents  energy  of  the  transmitted  signal    y1.    If  all  the  possible 
transmitted  signals  have  been  normalized  so  as  to  possess  the  same 
energy,  this  term  can  also  be  incorporated  into    C.    Otherwise,  it 
may  be  carried  along  as  an  additional  constant    C1.    The  situation 

becomes, 

N 
p(z/yi)  =  CCi  exp   I  £     zOyiQ) 

r  j=l 

It  is  seen  that  the  optimum  receiver,  which  furnishes  the  con- 
ditional probabilities    p(yi/z)    to  the  decision  making  processor, 
forms  a  cross  correlation  of  the  received  signal  and  the  transmitted 
signal  and  uses  the  result  in  computing  the  desired  conditional 
probability.    The  resulting  optimum  receiver  structure  is  shown  in 
Fig.  4-1. 

The  subject  of  correlation  detection  is  far  from  exhausted.    In 
particular,  the  decision  philosophy  to  be  employed  and  details  there- 
of have  been  omitted.    The  purpose  of  this  section  has  been  to  es- 
tablish a  frame  of  reference  to  be  extended  below  in  discussion  of 
random  signal  detection.    Further  discussion  of  the  problem  of 
correlation  detection,  as  well  as  references  to  the  literature  on  this 
subject,  may  be  found  in  [22  ],   [37  ]. 

While  the  discussion  above  has  followed  the  decision-theo- 
retically  optimum  Bayesian  approach  to  signal  detection,  situations 
exist  where  no  meaningful  values  for  the  required  a  priori  prob- 
abilities can  be  obtained.    It  is  customary  in  such  situations  to 
employ  the  philosophy  of  Maximum  Likelihood  suggested  by  R.  A. 
Fisher.    Evaluation  of  the  Likelihood  Function,  defined  as  the  same 
conditional  probability  density  function    p(z/yx)    considered  above, 
is  required.    The  resulting  quantities  are  weighted  by  relative  costs 
of  misclassification  and  compared,  yielding  a  decision  that  thak-eignal 
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y1      is  present  which  corresponds  to  the  maximum  Likelihood  Function 
[8  ].    The  essential  point  is  that  the  Likelihood  Function  must  be 
evaluated  in  either  approach.    Hence  one  may  approach  the  task  with 
relatively  more  confidence  that  efforts  are  well  directed. 
III.     RANDOM  SIGNAL  DETECTION 

In  this  section,  the  ideas  developed  above  will  be  extended  to 
handle  detection  of  Gaussian  signals  in  white  Gaussian  noise.    First 
the  approach  of  Kailath  [18  ]  will  be  discussed  and  its  computational 
difficulties  noted.    Then  application  of  recently  developed  filtering 
and  smoothing  techniques  to  this  problem  will  be  considered. 

Here  the  signal  is  assumed  to  comprise  a  sequence  of    N    samples 
from  a  Gaussian  process  whose  mean  value  function  and  covariance 
function  are  known.    Hence,  each  signal  vector    y1    possesses  a 
multivariate  normal  distribution  with  mean  vector    \x=  0    and  co- 
variance  matrix    K  x  . 

y 

K  i     =  Efriy  T)  =  (R1   )  i,  j  =  1,  2   .   .    .   .    N 

y  ij 

Just  as  in  detection  of  deterministic  signals  above,  the  problem  re- 
duces to  evaluation  of  the  Likelihood  Function    p(z/y1).    Since  the 
signal  and  additive  noise  are  assumed  statistically  independent,  the 
received  signal  covariance  is 

K  i    =    K  i     +  R  where    R  =  rl 

z  y 

The  Likelihood  Function  is 

p(z/yi  )  =  (2  77)-  f    IK1    \~\     exp -1  [zT  (K1  )_1z] 
z  z  l  2  z 

It  may  be  expanded  using  a  matrix  inversion  lemma  arising  from  the 
Frobenius-Schur  Relation  [7  ].     (Since  only  a  single  signal  is  being 
considered  for  the  moment,  notation  will  be  simplified  by  omitting 
the  superscript  temporarily.) 

K  _1      =R_1      -  R_1  (R_1     +K'1    J"1     R"1 
z  y 
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Thus, 

p(z/y)  =  C  exp  -  I   [  zTR_1z      -      zT    [R_1    (FT1  +  K  ~l)~l  R"T]  z  j 

The  first  term  in  the  exponent  does  not  involve  the  signal,  hence  will 
cancel  in  the  decision  process,  and  may  be  carried  as  a  constant  C, 
leaving 

p(z/y)  =  CC^expI  fz  T    [FT1   (R_1    +  Ky-1  )_1    R_1]z] 

Now 

(R^  +  K"1)"1!*"1      =  (I    +    RK  _1  )  _1 
x  y  y 

=  Ky  (Ky  +  R)-1 

=  K    K  _1      =   A 
y   z 

It  is  easily  shown  [18  ],  and  included  below,  that   A    is  the 
weighting  matrix  yielding  the  least  mean  squared  error  estimate  vec- 
tor   y    of    y    given  the  observation  vector    z. 

What  is  required  is  the  matrix   A    such  that  if 
y  =  Az 
the  mean  squared  error  between    y    and    y    will  be  minimized .    Error 

in  estimating  the    i    the  component  of    y    is 

N 
e.    =    y.    -    L    a. .    z  . 
J=l 

Hence, 

N        N   N  

ef=y\-2L3ij   z.  y.  +  I  L  a       a.k    z.zk 

j  =  1  j=W 

N  N      N 

-R      (0)-2La  R      (i,j)    +  l      L    a     a^  RZ2  «,k) 

j=l  j=l   k=l 

For  a  minimum, 

=0  j  =  1,    .    .    .    .  N 

oaij 
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Thus  N 

Ryy      (i/j)  =    I     aik  R  zz  (j,k) 

k=l 

(Note  that    R       (  t)  =  R  (  r).)    In  matrix  form, 

y*s  yy 

K       =AK  H>  A  =  K     K_1 

y  z  -^  y     z 

Having  computed    A,  the  Likelihood  Function  to  be  determined  takes 
the  form, 

p(z/y)  =  CC1      exp    ^       (zT  Az) 

2r 

=  CC1       exp     -J-     (z  Ty) 
2r  * 

It  is  seen  that  the  optimum  receiver  for  detecting  Gaussian  signals 

forms  a  cross  correlation  of  the  received  signal  and  a  least-squares 

estimate  of  the  Gaussian  signal  given  the  received  signal.    As  in  the 

case  of  deterministic  signal  detection.,:  this  cross  correlation  is  used 

to  evaluate  logarithms  of  the  conditional  probabilities    pte/y1) 

i  =  1,   .    .   .   .m,  which  are  in  turn  furnished  to  the  decision  making 

processor.    The  resulting  receiver  structure  is  shown  in  Fig.  4-2. 

Recalling  that  the  covariance  matrices    K_,    and  K„    are  of 

y  c 

dimension    NxN,  where    N    is  the  number  of  samples  to  be  processed, 
the  computational  difficulties  in  obtaining    A  =K    K~*     are  evident. 
Using  the  fact  that    K      is  a  Toeplitz  matrix,  Trench  [33  ]  has 
developed  an  algorithm  for  obtaining  the  inverse  requiring  effort 
proportional  only  to  the  square  of  the  matrix  dimension,  vice  its 
cube.    Thus  the  problem  is  reduced  in  magnitude,  but  remains  quite 
a  task  for  even  moderately  large  sample  sizes,  say  200.    Unfortunately, 
an  even  worse  problem  must  still  be  faced.    In  the  present  case  of 
random  signals,    Kz    is  a  function  of  K*    .    The  quantity  which  was 
carried  as  a  constant    C    (since  it  would  later  cancel)  in  the  Like- 
lihood Functions  of  deterministic  signals,  must  now  be  evaluated. 
Evaluation  of  the  determincint  of    K      is  involved.     Even  with  exploitation 
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of  the  Toeplitz  form  of    K    ,  the  problem  remains  a  considerable  one. 

The  signal  above  was  assumed  to  consist  of  a  sequence  of 
samples  from  a  Gaussian  random  process  with  known  mean  (zero)  and 
known  covariance  function.    Description  of  the  Gaussian  process  in 
an  equivalent  but  different  manner  will  permit  solving  the  problem 
while  avoiding  the  difficulties  above.    As  discussed  in  Chapter  III, 
the  Gaussian  process  may  be  represented  to  any  desired  degree  of 
approximation  as  the  output  of  a  linear  dynamical  system  excited  by 
white  Gaussian  noise. 

Statistical  description  of  the  Gaussian  message  process  will 
then  be  in  terms  of  the  mean  (zero)  and  covariance    q    of  the  excitation 
noise,  and  in  terms  of  matrices    T,  <$,  and    H    defining  the  input- 
output  relation  of  the  linear  dynamical  system.    As  before,  this 
statistical  description  of  the  message  process  is  assumed  known. 

An  alternate  approach  to  the  random  signal  detection  problem  may 
be  pursued  with  the  message  process  description  above  using  the 
results  of  Schweppe  [31  ].    It  involves  developing  a  difference 
equation  for  the  Likelihood  Function  (or  rather  its  logarithm) ,  from 
which  the  complete  Likelihood  Function,  not  just  its  exponent,  may 
be  determined  recursively  as  signal  samples  are  received.    Both  the 
problems  of  inverting  the    NxN    matrix    Kz    and  evaluating  its 
determinant  are  avoided. 

Let 

L(yM    =    ptz/y1) 
Then 

-2  In    LlvT(yi)=ln    (2tt)N|k     I    +    zTK-1z 
N  '     z  '  z 

Now, 

Lk  (y1)  =  p  [z(l) z(k)/y  l] 

=  p  [z(l)  .   .   .   .zfr-D/y1  ]p  [z(k)/z(l)  .   .  .zfc-lhy1] 
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Hence, 

lnL^y1   )  -  In  3^  (y1)    =  In  p  [z(k)/z(l)  .    .    .  z(k-l),  y  *] 

The  conditional  probability  density  function 

p[z(k)/z(l) z(k-l),  y1] 

must  be  considered  further.    Under  the  hypothesis  that  the  transmitted 
signal  is    y1     ,  the  matrices  describing  its  statistical  characteristics 
are  r1,  &1 ,  and    H1.    A  Kalman  filter  using  these  matrices  and  oper- 
ating on  the  received  signal  data  computes  values    x(k/k-l)    and 
P(k/k-l)    which  define  the  conditional  distribution 

p(x(k)/z(l) z(k-l),  y1  ) 

=  C  exp    -I[(x(k)  -  x(k/k-l))TP(k/k-l)_1   (x(k)-x(k/k-l))] 
2 

From  this,  it  follows  simply  that 

p[z(k)/z(l) z(k-l),  y1    ] 

=  (2  7r)_2  [HP(k/k-l)HT+  r]    ^    exp  -    [z(k)  -  z(k/k-l)32 

2(HP(k/k-l)HT+  r) 

Defining 

~(k)    =    z(k)  -  z(k/k-l) 
yields 

-2    lnp[z(k)/z(l) z(k-l),  y  i  ] 

=  In  2  7T[HP(k/k-l)HT  +  r  ]  +  z"(k)2 

[HP(k/k-l)HT+  r] 

A  difference  equation  in  -2  In  L    (y1).has  now  been  determined. 
The  original  requirement  was  to  evaluate    L^  (yi) ,  or,  equivalently, 
-2  In  L^j  (y1) .    Hence,  the  differences  are  simply  summed,    k  =  1, 
2,    .   .   .  N.    The  result  is 

-2  In  L    (y1)    =  In  (2  ff)N   I  K     I    +  z  TK_1z 

N  '     z  '  z 
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N  N 

=  £    In  2  ff[HP(k/k-l)HT  +  r  ]+    £  ?(k)2 

k=l  k=l     [HP(k/k-l)HT+  r] 

It  can  be  seen  that  -2  In  L^y1)    has  been  built  up  from  sums  of  scalar 
quantities.    Required  quantities    z(k)    and    [HP(k/k-l)H     +  r]     are 
already  available  in  the  Kalman  filter.    The  net  result  is  that  the  great 
practical  difficulties  of  inverting  and  evaluating  the  determinant  of  a 
large  matrix  have  been  avoided  by  an  approach  requiring  no  matrix  in- 
version at  all.    The  structure  of  the  optimum  receiver  using  this 
approach  is  shown  in  Fig.  4-3. 

On-line  computations  can  be  greatly  reduced  by  considering  use 
of  the  Likelihood  Function  above  and  some  details  of  filter  behavior. 

First  of  all,  since  Likelihood  Functions  for  the  various  alternatives 

N 

are  to  be  compared,  the  term      y      In  2  if  will  cancel  and  hence  may 

k=l 
be  discarded.    Further,  it  frequently  happens  that  the  quantity 
[HP(k/k-l)H™  +  r]      settles  within  a  few  time  increments  to  its  steady 
state  value.    In  such  cases,  all  necessary  values  of  filter  gain  vec- 
tors   B(k)    and    [HP(k/k-l)H^+ r]    may  be  computed  beforehand  and 
stored  with  modest  memory  requirements.    This  represents  quite  a 
saving  since  the  greater  part  of  computation  required  by  the  Kalman 
filter  asi  associated  with  updating  the  variance  equation  and  computing 
gains .    On-line  computations  are  thus  reduced  to  updating  the  state 
vector    x(k/k)    with  new  observations  and  adding  in  the  new  terms  to 
the  Likelihood  Function  evaluation.    From  the  filter  equations  listed 
in  Chapter  II,  this  may  be  seen  to  amount  to  approximately  2n  +  2 
multiply-add  combinations  per  received  sample  for  each  classification 
alternative,  where    n    is  the  order  of  the  model. 
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CHAPTER  V 
AN  EXPERIMENTAL  STUDY  IN  RANDOM  SIGNAL  DETECTION 
Finally,  utility  of  theoretical  developments  in  engineering  appli- 
cations rests  upon  ability  of  the  developed  techniques  to  perform 
successfully  inthe  "real  world".    Sensitivity  of  performance  to  dis- 
crepancies in  actual  and  assumed  parameters  must  be  low  enough  to 
permit  satisfactory  operation  in  the  face  of  such  discrepancies.    In 
terms  of  the  detection/classification  scheme  discussed  in  Chapter  IV, 
employing  recursive  evaluation  of  Likelihood  Functions,  some  points 
to  be  investigated  are: 

(i)    Sensitivity  to  variations  in  the  model  quality  as  measured 
by  the  ratio  of  "correlated"  to  "uncorrelated"  components 
of  the  received  signal, 
(ii)    Sensitivity  of  detection  performance  to  discrepancies  be- 
tween actual  and  assumed  values  of  received  signal  power, 
(iii)    Rapidity  with  which  the  technique  delivers  a  clearcut 
classification  decision. 
In  this  chapter,  the  above  points  are  discussed  in  the  context  of  an 
example  problem  using  actual  signals. 

I.     DESCRIPTION  OF  THE  PROBLEM 
In  the  following  example,  the  two-alternative  or  detection  problem 
is  studied.    Signals  used  are  hydrophone  recordings  of  (a)  sea  noise 
plus  sounds  of  a  diesel  submarine,  and  (b)  sea  noise  alone.    In  dis- 
cussions below,   (a)  is  referred  to  as  "signal"  and  (b)  as  "noise".    The 
resulting  problem  is  as  follows:    given  a  segment  of  received  signal 
record,  produce  a  Maximum  Likelihood  judgement  as  to  whether  the 
received  signal  is  submarine  or  noise  alone. 

Signals  used  here  were  originally  in  analog  form  on  magnetic 
tape,  and  were  digitized  for  use  in  computation  using  programs  de- 
scribed in  connection  with  the  example  in  Chapter  III.    The  basic 
sampling  interval  was  200  jxs.    However,  as  initial  analysis  disclosed 
little  signal  or  noise  energy  above  800  cycles  or  so,  only  every  third 
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sample  was  employed  in  computations,  extending  the  sampling  interval 
to  600  Jis.    The  total  effective  number  of  samples  per  block  was  thus 
reduced  to  1333.    Figs.  5-1  and  5-2  display  plots  of  IBO  samples  of 
signal  and  noise,  respectively.    Fig.  5-3  contains  a  comparative  plot 
of  sample  autocorrelation  functions  of  the  signal  and  noise. 

For  study  purposes,  ten  4000-sample  blocks  each  of  signal  and 
noise  were  digitized.    The  first  five  blocks  of  each  group  were  used 
for  obtaining  models.    Testing  of  the  detection  procedure  was  then 
done  using  the  remaining  five  blocks  in  each  group. 

II.     MODEL  DETERMINATION 

For  ease  in  determination,  the  autoregressive  form  of  model  was 
selected  for  the  signal  and  noise  sources.  .  Order  of  the  models  was 
chosen  to  be  six,  since  the  signal  was  known  a  priori  to  have  three 
prominent  "lines"  at  240,  390  and  625  cps.    For  less  or  no  a  priori 
knowledge  about  the  signal  structure,  a  procedure  might  be  to  begin 
with  low  order  models  and  successively  increase  the  order  until  the 
comparative  power  spectral  density  plots  indicate  satisfactory  rep- 
resentation.   For  the  present  problem,  recursive  estimation  of  model 
coefficients  as  in  the  example  of  Chapter  III  was  performed.    Resulting 
models  were  found  to  be  quite  consistent  from  block  to  block  for  the 
signal.    Noise  models  had  consistent  dominant  poles,  but  somewhat 
scattered  secondary  poles.    Figure  5-4  shows  the  z-plane  pole  lo- 
cations for  signal  models  from  the  first  five  blocks  of  signal  data. 
Figure  5-5  shows  the  same  information  for  noise  models. 

Selection  of  a  specific  model  for  use  in  detection  might  be  made 
by,  say,  averaging  the  corresponding  coefficients  of  the  five  sample- 
derived  models  for  each  case.    A  simpler  but  less  precise  method 
would  be  to  just  select  a  representative  model  from  each  group.    The 
latter  method  was  used  for  the  experiments  here.    Figures  5-6  and 
5-9  contain  z-plane  pole  plots  for  the  selected  signal  and  noise  models 
respectively.    Figures  5-7  and  5-8  display  comparative  plots  of  auto- 
correlation functions  and  power  spectral  densities  derived  from  the 
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Fig.  5-1      Sea  Noise  Plus  Diesel  Submarine  ("Signal") 


X -SCALE  -  1.8GE+81  UNJTS/JNCH. 
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Fig.  5-2     Sea  Noise  ("Noise") 
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Pig.  5-3     Autocorrelation  Function  of  "Signal"  and  "Noise" 
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Fig.  5-4     Z-Plane  Poles  of  Autoregressive  Signal  Models 
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Fig.  5-6    Poles  of  Representative  Signal  Model 


75 


Si 


- 
- 

. 

■    .'     i 

1           .         :                      ' 

i 

i 

. 

\ 

| 

1 

i 

1 

A 

/ 

\      i     \ 

//\   \ 

»e  \       / 

005\       | 

\     // 
I    II 

,    \  / 
»  / 
\  I 
1/ 
W 

aie   //      \\ 

915/         U 

T0     \\     // 

RS 


Fig.  5-7   Autocorrelation  Functions  from  Signal  Data 
and  Signal  Model 
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Fig.  5-8    Power  Spectral  Densities  from  Signal  Data  and 
Signal  Model 
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Fig.  5-9    Poles  of  Representative  Noise  Model 
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Fig.  5-10   Autocorrelation  Functions  from  Noise  Data  and 
Noise  Model 
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Fig.  5-11    Power  Spectral  Densities  from  Noise  Data 
and  Noise  Model 


80 


data  sample  and  model  respectively.    Figures  5-10  and  5-11  convey 
the  same  information  about  the  noise  model. 

III.    STUDY  OF  PERFORMANCE 

For  an  honest  evaluation  of  discrimination  power  of  the  technique 
heing  tested,  it  is  important  that  signal  classification  be  made  solely 
on  the  basis  of  differences  in  correlation  structure  of  the  alternate 
possibilities  and  not  on  energy  levels.    For  this  reason,  each  batch  of 
4000  samples  from  which  sub-blocks  of  100  samples  were  processed  by 
the  detector  was  normalized  to  have  a  mean  squared  sample  value  of 
unity . 

In  terms  of  the  discrete  signal  process  model  pictured  in  Fig.  2fc2, 
this  means  that    E[z2]  =  1.    But  since 

E[z2]=E[y2]+r/ 

it  remains  to  be  determined  what  is  the  most  likely  ratio  of    E[y2]  to 
r.    The  ratio  is  a  measure  of  the  degree  to  which  the  model  obtained 
represents  the  actual  observed  signal. 

Using  a  value  of  .9  for  this  ratio,  the  Likelihood  Function  for 
each  alternative  is  computed  as  a  function  of  the  number  of  samples 
processed  and  is  plotted  in  Fig.  5-12.    Since  the  quantity  -2  In 
Likelihood  Function  is  actually  computed  and  displayed,  higher  like- 
lihoods are  represented  by  lower  ordinates  on  the  graph.    In  Fig.  5-12 
samples  processed  were  actually  "signal".    Hence  correct  classification 
is  indicated  by  the  graph. 

Program  QTEST  then  evaluates  the  Likelihood  Function  for  each 
classification  alternative  for  values  of  this  ratio  from  .1  to  .9. 
Figures  5-13  to  5-16  display  the  resulting  values  of  the  Likelihood 
Function  versus  the  number  of  data  samples  processed.    Note  that 
correct  classification  was  achieved  for  all  values  of  the  ratio  between 
.1  and  .9.    It  might  also  be  noted  that  the  ratio  value  of  .9  gave  the 
greatest  margin  of  correct  classification.    Correct  classification  was 
also  consistently  obtained  in  the  same  manner  for  all  blocks  of  in- 
dependent data  (not  utilized  in  model  determination). 
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Fig.  5-12     Likelihood  Functions  for  Signal  Data  Computed 
Using  Signal  and  Noise  Models 


82 


Fig.  5-13    Likelihood  Functions  of  Signal  Using  Signal  Model 
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Fig.  5-14    Likelihood  Functions  of  Signal  Using  Noise  Model 
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Fig.  5-16    Likelihood  Functions  of  Noise  Using  Noise  Model 


86 


The  normalization  process  described  above  is  roughly  equivalent, 
in  practical  terms,  to  automatic  gain  control  (AGC) .    It  is  important 
that  the  detection  scheme  continue  to  function  efficiently  under  de- 
viations between  actual  and  assumed  amplitudes  (and  corresponding 
mean  square  values).    Hence  a  test  was  undertaken  to  check  classi- 
fication efficiency  against  signals  with  actual  mean  square  values 
ranging  from  one  fourth  to  four  times  the  assumed  value.    Again,  con- 
sistent correct  classification  was  maintained.    Figures  5-17  to  5-24  t 
contain  the  resulting  Likelihood  Function  plots  for  actual  mean  square 
values  equal  to  one  fourth  and  four  times  the  assumed  value,  in  all 
cases  equal  to  unity. 
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Fig.  5-17     Likelihood  Functions  of  Signal  Using  Signal  Model 
Mean  Square  Signal  Received  Equal  One-fourth 
Assumed  Value 
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Fig.  5-18     Likelihood  Function  of  Signal  Using  Noise  Model 
Mean  Square  Signal  Received  Equal  One-fourth 
Assumed  Value 
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Fig.  5-19     Likelihood  Function  of  Noise  Using  Signal  Model 
Mean  Square  Signal  Received  Equal  One-fourth 
Assumed  Value 
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Fig.  5-20     Likelihood  Function  of  Noise  Using  Noise  Model  - 
Mean  Square  Signal  Received  Equal  One-fourth 
Assumed  Value 
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Fig.  5-21      Likelihood  Function  of  Signal  Using  Signal  Model 
Mean  Square  Signal  Received  Equal  Four  Times 
Assumed  Value 
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Fig.  5-22     Likelihood  Function  of  Signal  Using  Noise  Model 
Mean  Square  Signal  Received  Equal  Four  Times 
Assumed  Value 
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Fig.  5-23     Likelihood  Function  of  Noise  Using  Signal  Model 
Mean  Square  Signal  Received  Equal  Four  Times 
Assumed  Value 
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Fig.  5-24     Likelihood  Function  of  Noise  Using  Noise  Model 
Mean  Square  Signal  Received  Equal  Four  Times 
Assumed  Value 
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CHAPTER  VI 
RECOMMENDATIONS  FOR  FURTHER  WORK 

A  frequent  happening  in  applied  research  is  that  there  are  as  many 
new  questions  raised  as  old  questions  resolved.    Such  has  been  the 
case  in  this  instance. 

Perhaps  the  problem  most  ripe  for  attack  is  that  of  estimating  num- 
erator coefficients  of  the  linear  filter  model  for  random  time  series . 
This  is  equivalent  to  estimation  of  the  moving  average  coefficients  in 
a  mixed  auto-regressive-moving  average  model.    A  promising  approach 
might  lie  in  numerical  processing  of  the  residuals  after  recursive  de- 
termination of  the  autoregressive  coefficients. 

Further  study  of  applications  of  the  methods  used  here  to  practical 
problems  in  signal  detection/classification  would  also  be  interesting 
and  potentially  valuable .    In  particular,  multiple-alternative  signal 
classification  should  be  interesting  to  consider.    Performance  in  such 
problems  should  be  improved  by  better  models  resulting  from  successful 
determination  of  linear  filter  model  numerator  coefficients . 

Reliability  of  detection/classification  as  a  function  of  record 
length  processed  and  as  a  function  of  some  measure  of  difference  be- 
tween power  spectral  densities  of  alternate  classifications  should  be 
valuable  to  investigate.    For  the  detection  problem,  differences  between 
power  spectral  densities  of  signal  plus  noise  and  noise  alone  relate  to 
signal-to-noise  ratio  in  the  conventional  problem  formulation. 

Classification  methods  used  herein  also  hold  potential  contri- 
butions it  character  recognition.    Required  for  application  of  the 
methods  here  are  temporally  vice  spatially  distributed  patterns.    How- 
ever, the  scanning  process  used  in  several  present  approaches  to 
character  recognition  already  produces  thffnecessary  spatial-temporal 
transformation.    The  advantage  to  be  gained  should  be  an  insensitivity 
to  character  translation,  a  problem  which  plagues  many  present 
approaches . 

Finally,  the  problem  of  model  determination  for  vector-valued 
random  time  series  remains.    Some  work  on  the  problem  has  been  done, 
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and  is  described  in  [3  ],  [30  ].    A  comparative  study  of  work  referenced 
and  the  general  linear  filter  model  should  hold  some  promise  of  bene- 
fiting the  general  analysis  of  vector-valued  random  time  series  both 
in  engineering  and  other  disciplines . 
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APPENDIX  I 
OPTIMAL  ESTIMATION  OF  AN  ARBITRARY  LINEAR  COMBINATION 
OF  SIGNAL  GENERATING  PROCESS  STATE  VARIABLES 
Assume  the  usual  model  of  the  signal  generating  process  with 
state  variables    x(k)    and  noisy  observations    z(k)  =  Hx(k)  +  u(k).    It 
is  desired  to  estimate  an  arbitrary  linear  combination  of  the  state 
variables    s  =  Dx    based  on  the  noisy  observations    z.    The  optimal 
linear  estimate  (in  the  Bayes  sense,  with  a  quadratic  loss  function) 
can  be  shown  [17  ],   [35  ]   to  be  the  expected  value  of  the  conditional 
distribution 

p[s(k)/z(k),  z(k-l),   ....  z(l)] 

Since  the  process  is  linear,  and  since  the  excitation  and  noise 
sources  are  gaussian,  the  conditional  distributions  of  both    x    and    s 
given  measurements    z    are  gaussian.    The  problem  at  hand  will  be 
solved  if  the  conditional  distribution  of    s    given    z    can  be  obtained, 
from  which  the  mean  may  be  extracted. 

Consider  first  the  conditional  distribution  of    x    given  the 
measurements    z.    By  Bayes  rule, 

p[x(k)/z(k),    z(k-l),   .   .   .  z(l)] 

=  p[z(k)/x(k),z(k-l),    .    .  z(l)fo  [x(k)/z(k-l),    .    .  z(l)] 
p[z(k)/z(k-l),    .    .  z(l)] 

As  pointed  out,  the  form  of  the  left  hand  side  is  known,  and  is: 

p[x(k)/z(k),z(k-l),   .   .    .  z(l)] 

=  C  exp  -|  [(x(k)  -  x(k/k))T  P(k/k)-1     (x(k)  -  x(k/k))] 

On  the  right  hand  side, 

p[z(k)/x(k),z(k-l),   .    .   .z(l)]=p[z(k)/x(k)] 
=  C  exp  -|[(z(k)  -  Hx(k))T  R"1    (z(k)  -  Hx(k))] 
p[x(k)/z(k-l),    .    .    .  z(l)] 

=  C  exp  -|[(x(k)  -  0x(k-l/k-l))  TP(k/k-l)_1 
(x(k)  -  0x(k-l/k-l))] 
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p[z(k)/z(k-l),   .   .   .  z(l)] 

=  C  exp  -|[(z(k)  -  H  0x(k-l/k-l))T     (HP(k/k-l)HT  +    R)"1 
(z(k)  -  H  0x(k-l/k-l))] 

Defining  parameters  of  the  left  hand  side  distribution  may  be  obtained 
from  the  right  hand  side  by  matching  terms  [26  ]  .    The  Kalman  filter 
equations  are  obtained,  which  define  the  left  hand  side  distribution 
parameters  recursively,  and  one  has 

x(k)/z(k),z(k-l),   ....  z(l)    ~  N[x(k/k),  P(k/k)] 

in  terms  of  known  quantities. 

In  Anderson  [l  ]  ,  it  is  proved  that 

x   ~  N[m  ,  L]         =>      s  =  Dx   ~  N[D  fl ,  D  ED  T  ] 

The  theorem  includes  the  cases  where  x  may  have  either  a  non- 
singular  or  singular  distribution  and  D  may  be  nonsingular  or  of 
rank  less  than  the  dimension  of    s. 

Invoking  the  theorem,  the  optimal  estimate  of  s  is  seen  to  be 
Dx(k/k).  Hence  the  optimal  estimate  of  a  linear  combination  of  the 
states  is  the  same  linear  combination  of  the  optimal  estimate  of  the 
states . 
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APPENDIX  II 
THE  KALMAN  FILTER  AND  ESTIMATION  OF  A  GAUSSIAN  MEAN 

The  following  discussion  outlines  an  interpretation  of  the  Kalman 
filter  (discrete  casej^in  which  it  is  considered  to  be  a  generalized  es- 
timation of  a  Gaussian  population  mean.    The  term  "generalized"  is 
used  to  mean  that,  contrary  to  usual  assumptions  in  estimation  of 
the  mean,  the  population  to  be  estimated  here  will  be  permitted  to 
move  between  sampling  instants.    Movement  will  be  assumed  to 
consist  of  both  a  prescribed  component  and  a  random  component. 

The  analogy  to  be  described  permits  achieving  an  intuitive  feel 
for  rates  of  convergence,  effects  of  uncertainty  in  iiaatial  conditions, 
and  effects  of  random  signal  process  excitation.    Such  an  intuitive 
feel  for  filter  behavior  is  frequently  difficult  to  obtain  from  conven- 
tional formulation  of  the  estimation  problem . 

The  univariate  case  will  be  considered  first,  for  simplicity.    The 
same  analogy,  however,  carries  over  to  the  multivariate  case,  and 
will  be  considered  intfrief  detail  later. 
"No  Dynamics"Situation 

To  begin  with,  the  classical  mean  estimation  procedure  will  be 
written  in  recursive  form.    Consider  a  scalar  random  variable    Z 
where 

Z   ~  N(x,  r)  x  =  unknown  population  mean 

r  =  known  population  variance 

If    Z    is  the  sample  mean  of  a  sample  of  size    n    from  the  population, 

then 

Z   ~  N(x,  p)  where    p    =    — 

n 

Since  the  sample  mean  is  the  best  (minimum  variance)  estimate  of 

the  mean  of  a  Gaussian  population  [17  ]  ,    x    =    Z. 

Let    z(l),  z(2),   .   .   .   .  z(k)    denote  sample  values  of  the  random 

variable    Z. 
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Then 


x(k)  =  ±L    z(i)    =    x(k-l)  +  I    [z(k)  -  x(k-l)] 
KM  K 

p(k)=    £iiL    =(1-1   )p(k-l)  k  =  2,  3,   . 

k  k 


4    1 


If    b(k)  S  i.      then 
k 


x(k)  =  x(k-l)  +  b(k)  [z(k)  -  x(k-l)] 

p(k)  =  [1  -  b(k)]p(k-l)  k  =  2,  3 


It  will  be  useful  to  note  that  as  each  new  sample  value  is  con- 
sidered, the  weightings  given  it  and  the  previous  best  estimate  are 
in  inverse  proportion  to  their  respective  variances.    For  example,  at 
the    k  thi stage  above, 

£(k)  =     k^Ix^k-l)  +   I  z(k) 
k  k 

Var    x(k-l)  =  —I — 
k-1 

Var    z  (k)  =  r 

Var   5(k)  =  J_ 


k  k-1 


r    +- r 


k-1 


Weighting  given  to  sample  value  =     —  =    , 

k  1 

r  k 

r_ 
Weighting  given  to  previous  best  estimate    =       k =    k-1 


k-1 


Ah  alternate  view  is  to  note  that  weightings  given  each  sample 
value  and  the  previous  best  estimate  areiproportional  to  their  relative 
information  content,  the  reciprocal  of  their  relative  uncertainties. 
From  this  point  of  view,  the  information  content  of  a  current  best 
estimate  (the  reciprocal  of  its  variance)  is  the  sum  of  the  information 
contents  of  the  sample  and  previous  estimate. 

Next,  the  behavior  of  a  Kalman  filter  of  the  following  description 
will  be  compared  with  that  of  the  sequential  estimation  procedure  above 
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The  restrictions  that  the  transition  matrix  (a  scalar  here)  be  unity 
and  that  there  lie  no  signal  process  excitation  will  be  relaxed  later. 

x  =  scalar  H  =  h  =  1 

$=0=1  R  =  r 

Q=q=0  P=p 

The  four  equations  describing  filter  behavior  are  repeated  below  for 
reference  as  desired. 

P(k/k-l)  =  $P(k-l/k-l)  4  T+  Q 

B(k)  =  P(k/k-l)H  T  [HP(k/k-l)H  T+  R]"1 

x(k/k)  =  <|x(k-l/k-l)  +  B(k)[z(k)  -  <*x(k-l/k-l)] 

P(k/k)  =  [I  -  B(k)H]P(k/k-l) 

Three  cases  will  be  discussed,  outlining  filter  behavior  under  three 
different  levels  of  uncertainty  in  itifeial  estimates  (a  priori  input  to 
the  filter).    Single  subscripts  will  be  used  in  the  present  section 
since  there  is  no  ambiguity  under  the  conditions  assumed,  i.e. 
p(k/k)  =p(k/k-l). 

Case  1 .    Great  uncertainty  in  initial  estimate  of    x.    Let 
p(0)  ■=  ar    where    a  =  large  number 

b(l)  =  ph(hphT  +  r)_1  =_§£_=    § m     l 

(a+l)r  a+1 

p(l)  =  (1  -  b(l)h)p(0)  =  (1  -_ ^_)ar  = § M  r 

a+1  a+1 

b(2)    «-I— =  i  b(k)  =     I 

2r  — *S  k 

— -^ 

P(2)    M  (1  -i)r-JL  P(k)=-i- 

2  k 

Case  2 .     Uncertainty  in  initial  estimate  of    x   approximately 
equal  to  uncertainty  in  the  observation  process,  i.e.    p(0)  =  r. 

b(l)=-r^=    h  -^         b(k)  = 


I=>  '        k+1 

=    r  ^i^\  ^     r 


p(l)  =  [1  -b(l)]P(0)  -JL-         p(k) 


k+1 
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Case  3,    Uncertainty  in  initial  estimate  of    x    less  than  uncer- 
tainty in  observation  process. 

Let    p(0)  =     — —     where    I  is  some  positive  integer. 

I  1 


p(l)  =  (l  -b(l))p(0)  =  (1  --f- -) 


l+l  I  l+l      £  l+l 


-1  =  1 


b(2)=^L-(_I_     +r) 

l+l       l+l  l+l 

p(2)  =  (l  -b(2))p(l)  =  (1  --A-    )-I 


r+i 

i 

r 

i+i 

I 

1*4 

1 

2+4 

l+l 

__ 

r 

2+1  l+l  l+l 


b(k)  =  _L_ 

k+l 


P(k)  = 


_      r 


k+l 


In  the  Kalman  filter  for  the  particular  process  just  described, 
effects  of  varying  degrees  of  uncertainty  in  the  initial  estimate  are 
isolated  and  clearly  observed.    It  can  be  seen  that  for  great  uncer- 
tainty in  the  initial  estimate,  the  filter  behaves  asymptotically  as  the 
sequential  estimation  procedure  outlined,  i.e.    p(k)    and    b(k) 

(k  =  2,  3,     .   .  .  n)  decrease  from    r    and    1    respectively  as     —  . 

k 

Eor  uncertainty  in  the  initial  estimate  equal  to  or  less  than  uncertainty 

imposed  by  the  observation  process  such  that    p(0)  =     -r  r    (  l- 

positive  integer) ,  then  the  filter  data  weighting  and  convergence  rate 

are  the  same  as  the  data  weighting  and  convergence  rate  of  the 

sampling  estimation  scheme  from  its     (^-l)th    sample.    In  other  words, 

a  priori  information  may  be  regarded  as  providing  equivalent  additional 

observations  of  the  process  to  be  estimated. 

It  is  thus  clearly  shown  that  under  the  conditions  stated,    p   and 

b    decrease  asymptotically  as   i-.    Interpolating  between  the  assumed 

k 

rational  values  of  the  ratio   — - —      yields  a  gain  schedule  b(k)  = 


p(0)  k+a 

(a  =  — - —  )    for  the  filter  in  the  simple  situation  being  considered. 
p(0) 
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Unexcited  Dynamic  Situation 

Next,  a  mpre  general  univariate  situation  will  be  considered.    In 
the  Kalman  filter  representation,  the  restriction  above  that    0=1 
will  be  relaxed.    The  corresponding  situation  in  the  sequential  estimation 
procedure  will  be  that  during  the  sampling  process,  the  population  mean 
moves  about  in  a  prescribed  fashion,  i.e.    x(k+l)  =  0x(k).    As  pointed 
out  above,  the  sequential  estimation  procedure  forms  a  weighted  sum 
of  the  previous  best  estimate  and  the  new  data  (sample  value) ,  and 
specifies  a  new  estimate  variance  equal  to  the  "parallel  combination" 
of  the  sample  and  previous  optimal  estimate  variances. 

The  essential  difference  between  the  present  case  with  relaxed 
restrictions  on    0  and  the  case  previously  discussed  is  simply  that 
just  as  the  population  mean  is  moved  between  samples,  the  previous 
best  estimate  must  be  adjusted  by    0  prior  to  adding  in  effects  of  the 
new  sample  value. 

A  double-subscript  notation  will  be  helpful  here.    Let  x(k/k-l)  = 
The  optimal  estimate  of    x    at  sample  time    k    given  samples  up  to 
sample  time    k-1 . 

Since       x(k)  =  0x(k-l)  ,    x(k/k-l)  =  0x(k-l/k-l) 
and  since     x(k-l/k-l)    ~N(x(k-l),    p(k-l/k-l)). 


x(k/k-l)  ~N(x(k),      02p(k-l/k-l)). 


Hence, 


p  (k/k)  =    r02p(k-l/k-l)         =        rp(k/k-l) 

r  +  0s  p(k-l/k-l)  r  +  p(k/k-l) 

=  b(k)r 

The  resulting  gains  with  which    x(k/k-l)    and    z(k)    are  added  are 

pfcA?       and     pfrA) 

p(k/k-l)     and  r  ' 

No  additional  discussion  of  the  Kalman  filter  is  required  for  this 
situation,  where    0  is  not  constrained  to  be  unity.    If  the  filter  a 
priori  variance    p(0)    equals    p(k/k-l)    of  the  sequential  estimation 
procedure  for  any    k,    the  filter  will  behave  from  the  beginning  just 
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as  the  estimation  scheme  does  from  the    k  th    stage  on. 
Randomly  Excited  Dynamics 

The  situation  may  be  made  still  more  general  by  relaxing  the 
restriction  in   A    that    q  =  0.    In  the  sequential  estimation  procedure, 
the  interpretation  is  that  not  only  will  the  population  mean  move  in  a 
prescribed  fashion  between  sampling  instants,  but  will  have  an 
additional  random  motion  imparted  to  it  by  a  gaussian  excitation  in- 
dependent from  sample  to  sample. 

Again,  no  additional  discussion  of  the  Kalman  filter  is  required. 
Further,  treatment  of  theipresent  situation  in  the  sequential  estimation 
procedure  is  a  direct  extension  from    B. 

The  first  few  steps  are 

P(D  =r 

b(l)  =  1 

p(2/l)  =  03r  +  q 

p(2/2)  =     (03r  +  q)r 

(02+  l)r  +q 

b(2)  =  p(2/2)         =    03r  +  q 

r  (02  +  l)r  +  q 

etc. 

The  steps  in  general  are 

p(k/k-l)  =  03p(k-l/k-l)  +  q 

p(k/k)  =    p(k/k-l)r 

p(k/k-l)  +r 

b(k)  =    p(k/k-l) 

p(k/k-l)  +  r 

The  general  updating  steps  agree  with  those  of  the  Kalman  filter. 
Again,  if  the  filter  a  priori  variance    p(0)    equals    p(k/k-l)    of  the 
sequential  estimation  procedure  for  some    k,  the  same  remark  as 
in    B    is  valid.    For  the  present  situation,    p    does  not  continue  in- 
definitely to  decrease,  but  reaches  a  non-zero  steady  state  value. 
This  steady  state  value  of    p    and  the  corresponding  steady  state 
gain  may  be  computed  by  setting    p(k+l/k+l)  =  p(k/k)    and  solving. 
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The  Multivariate  Case 

Discussion  of  the  multivariate  case  will  be  limited  to  situations 
where    H  =  I.    Though  this  excludes  cases  of  major  interest  where 
the  Kalman  filter  has  been  employed,  the  additional  detail  required 
would  obscure  the  comparison  sought  here. 

Under  the  frequent  assumptions  of  diagonal    P(0)    and    R    matrices, 
the  case  where     $  =  I    and    Q  =  0    (no  dynamics)  really  amounts  to    n 
uncoupled  problems  of  the  type  covered  above,  and  is  of  little 
additional  interest.    Ho  has  shown  [15  ]  that    P    decreases  asymp- 
totically as   -I   for  this  case  without  the  assumptiin  of  a  diagonal 
k 

P(0). 

The  randomly  excited  dynamic  situation  is  of  more  general 
interest  and  is  a  direct  extension  of  the  approach  in  the  univariate 
case.    As  before,  if  the  filter  a  priori  covariance  matrix    P(0)  = 
P(k/k-l)    of  the  sequential  estimation  procedure  for  any    k,  its 
effect  may  be  regarded  as  supplying    k    equivalent  additional  ob- 
servations to  the  filter.    Adjustment  of  the  optimal  estimate  covariance 
between  samples  is: 

P(k/k-l)  =  &P(k-l/k-l)  4T+  Q. 
Updating  of  the  covariance  matrix  and  computation  of  the  new  gain 
is  also  a  direct  extension: 

Univariate  case:  Multivariate  case: 

p(k/k)  =  p(k/k-l)r  P(k/k)  =  P(k/k-l)(P(k/k-l)  +  R)_1R 

p(k/k-l)  +  r 

b(k)  =  p(k/k)  B(k)  =  P(k/k)R-1 

r 

A  check  with  the  filter  equations  above  will  disclose  that  this  is 

exactly  the  same  updating  sequence  as  in  the  Kalman  filter. 
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